RECANTFormer: Referring Expression Comprehension with Varying Numbers of Targets
Bhathiya Hemanthage, Hakan Bilen, Phil Bartie, Christian Dondrup, Oliver Lemon
Abstract
The Generalized Referring Expression Comprehension (GREC) task extends classic REC by generating image bounding boxes for objects referred to in natural language expressions, which may indicate zero, one, or multiple targets. This generalization enhances the practicality of REC models for diverse real-world applications. However, the presence of varying numbers of targets in samples makes GREC a more complex task, both in terms of training supervision and final prediction selection strategy. Addressing these challenges, we introduce RECANTFormer, a one-stage method for GREC that combines a decoder-free (encoder-only) transformer architecture with DETR-like Hungarian matching. Our approach consistently outperforms baselines by significant margins in three GREC datasets.- Anthology ID:
- 2024.emnlp-main.1214
- Volume:
- Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 21784–21798
- Language:
- URL:
- https://preview.aclanthology.org/Add-Cong-Liu-Florida-Atlantic-University-author-id/2024.emnlp-main.1214/
- DOI:
- 10.18653/v1/2024.emnlp-main.1214
- Cite (ACL):
- Bhathiya Hemanthage, Hakan Bilen, Phil Bartie, Christian Dondrup, and Oliver Lemon. 2024. RECANTFormer: Referring Expression Comprehension with Varying Numbers of Targets. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 21784–21798, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- RECANTFormer: Referring Expression Comprehension with Varying Numbers of Targets (Hemanthage et al., EMNLP 2024)
- PDF:
- https://preview.aclanthology.org/Add-Cong-Liu-Florida-Atlantic-University-author-id/2024.emnlp-main.1214.pdf