KnowDR-REC: Auditing Knowledge-Conditioned Visual Grounding in Referring Expression Comprehension
Guanghao Jin, Jingpei Wu, Tianpei Guo, Yiyi Niu, Weidong Zhou, Linyi Yang, Guoyang Liu
Abstract
While Multimodal Large Language Models (MLLMs) have demonstrated the capacity for multi-modal reasoning, current Referring Expression Comprehension (REC) benchmarks lag behind, predominantly relying on intra-image cues and neglecting the integration of external world knowledge, which significantly impedes the evolution of REC towards real-world applications. This limitation obscures a model’s true capability to conduct textual reasoning (entity resolution), resolve spatial location (visual grounding), and verify reference validity (hallucination rejection). To address this, we introduce KnowDR-REC, a targeted audit benchmark comprising 1,042 positive triplets derived from real-world knowledge, along with rigorously matched negative samples. Unlike traditional datasets, we implement a controllable counterfactual evaluation mechanism that subjects textual expressions to single-factor perturbations (entity, relation, or time) to test sensitivity to fine-grained factual changes. Extensive evaluation of 18 state-of-the-art LMMs exposes a critical “binding hallucination,” revealing that current high performance is often built on fragile visual shortcuts rather than true understanding. KnowDR-REC thus serves as a pivotal diagnostic instrument, steering future research toward the genuine integration of perception and reasoning.- Anthology ID:
- 2026.findings-acl.1923
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 38607–38629
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1923/
- DOI:
- Cite (ACL):
- Guanghao Jin, Jingpei Wu, Tianpei Guo, Yiyi Niu, Weidong Zhou, Linyi Yang, and Guoyang Liu. 2026. KnowDR-REC: Auditing Knowledge-Conditioned Visual Grounding in Referring Expression Comprehension. In Findings of the Association for Computational Linguistics: ACL 2026, pages 38607–38629, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- KnowDR-REC: Auditing Knowledge-Conditioned Visual Grounding in Referring Expression Comprehension (Jin et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1923.pdf