ISR: Self-Refining Referring Expressions for Entity Grounding
Zhuocheng Yu, Bingchan Zhao, Yifan Song, Sujian Li, Zhonghui He
Abstract
Entity grounding, a crucial task in constructing multimodal knowledge graphs, aims to align entities from knowledge graphs with their corresponding images. Unlike conventional visual grounding tasks that use referring expressions (REs) as inputs, entity grounding relies solely on entity names and types, presenting a significant challenge. To address this, we introduce a novel **I**terative **S**elf-**R**efinement (**ISR**) scheme to enhance the multimodal large language model’s capability to generate high quality REs for the given entities as explicit contextual clues. This training scheme, inspired by human learning dynamics and human annotation processes, enables the MLLM to iteratively generate and refine REs by learning from successes and failures, guided by outcome rewards from a visual grounding model. This iterative cycle of self-refinement avoids overfitting to fixed annotations and fosters continued improvement in referring expression generation. Extensive experiments demonstrate that our methods surpasses other methods in entity grounding, highlighting its effectiveness, robustness and potential for broader applications.- Anthology ID:
- 2025.acl-long.1483
- Volume:
- Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 30702–30714
- Language:
- URL:
- https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1483/
- DOI:
- Cite (ACL):
- Zhuocheng Yu, Bingchan Zhao, Yifan Song, Sujian Li, and Zhonghui He. 2025. ISR: Self-Refining Referring Expressions for Entity Grounding. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 30702–30714, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- ISR: Self-Refining Referring Expressions for Entity Grounding (Yu et al., ACL 2025)
- PDF:
- https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1483.pdf