Abstract
We propose a new visual grounding task called Visual Query Detection (VQD). In VQD, the task is to localize a variable number of objects in an image where the objects are specified in natural language. VQD is related to visual referring expression comprehension, where the task is to localize only one object. We propose the first algorithms for VQD, and we evaluate them on both visual referring expression datasets and our new VQDv1 dataset.- Anthology ID:
- N19-1194
- Volume:
- Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
- Month:
- June
- Year:
- 2019
- Address:
- Minneapolis, Minnesota
- Editors:
- Jill Burstein, Christy Doran, Thamar Solorio
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1955–1961
- Language:
- URL:
- https://aclanthology.org/N19-1194
- DOI:
- 10.18653/v1/N19-1194
- Cite (ACL):
- Manoj Acharya, Karan Jariwala, and Christopher Kanan. 2019. VQD: Visual Query Detection In Natural Scenes. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1955–1961, Minneapolis, Minnesota. Association for Computational Linguistics.
- Cite (Informal):
- VQD: Visual Query Detection In Natural Scenes (Acharya et al., NAACL 2019)
- PDF:
- https://preview.aclanthology.org/ml4al-ingestion/N19-1194.pdf
- Data
- VQDv1, MS COCO, RefCOCO, Visual Question Answering