Change Entity-guided Heterogeneous Representation Disentangling for Change Captioning

Yi Li; Yunbin Tu; Liang Li; Li Su; Qingming Huang

Change Entity-guided Heterogeneous Representation Disentangling for Change Captioning

Yi Li, Yunbin Tu, Liang Li, Li Su, Qingming Huang

Abstract

Change captioning aims to describe differences between a pair of images using natural language. However, learning effective difference representations is highly challenging due to distractors such as illumination and viewpoint changes. To address this, we propose a change-entity-guided disentanglement network that explicitly learns difference representations while mitigating the impact of distractors. Specifically, we first design a change entity retrieval module to identify key objects involved in the change from a textual perspective. Then, we introduce a difference representation enhancement module that strengthens the learned features, disentangling genuine differences from background variations. To further refine the generation process, we incorporate a gated Transformer decoder, which dynamically integrates both visual difference and textual change-entity information. Extensive experiments on CLEVR-Change, CLEVR-DC and Spot-the-Diff datasets demonstrate that our method outperforms existing approaches, achieving state-of-the-art performance. The code is available at https://github.com/yili-19/CHEER

Anthology ID:: 2025.findings-acl.876
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venues:: Findings | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 17050–17060
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.876/
DOI:
Bibkey:
Cite (ACL):: Yi Li, Yunbin Tu, Liang Li, Li Su, and Qingming Huang. 2025. Change Entity-guided Heterogeneous Representation Disentangling for Change Captioning. In Findings of the Association for Computational Linguistics: ACL 2025, pages 17050–17060, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Change Entity-guided Heterogeneous Representation Disentangling for Change Captioning (Li et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.876.pdf

PDF Cite Search Fix data