Object Ordering with Bidirectional Matchings for Visual Reasoning

Hao Tan, Mohit Bansal


Abstract
Visual reasoning with compositional natural language instructions, e.g., based on the newly-released Cornell Natural Language Visual Reasoning (NLVR) dataset, is a challenging task, where the model needs to have the ability to create an accurate mapping between the diverse phrases and the several objects placed in complex arrangements in the image. Further, this mapping needs to be processed to answer the question in the statement given the ordering and relationship of the objects across three similar images. In this paper, we propose a novel end-to-end neural model for the NLVR task, where we first use joint bidirectional attention to build a two-way conditioning between the visual information and the language phrases. Next, we use an RL-based pointer network to sort and process the varying number of unordered objects (so as to match the order of the statement phrases) in each of the three images and then pool over the three decisions. Our model achieves strong improvements (of 4-6% absolute) over the state-of-the-art on both the structured representation and raw image versions of the dataset.
Anthology ID:
N18-2071
Volume:
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Editors:
Marilyn Walker, Heng Ji, Amanda Stent
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
444–451
Language:
URL:
https://aclanthology.org/N18-2071
DOI:
10.18653/v1/N18-2071
Bibkey:
Cite (ACL):
Hao Tan and Mohit Bansal. 2018. Object Ordering with Bidirectional Matchings for Visual Reasoning. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 444–451, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
Object Ordering with Bidirectional Matchings for Visual Reasoning (Tan & Bansal, NAACL 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-bitext-workshop/N18-2071.pdf
Video:
 https://preview.aclanthology.org/ingest-bitext-workshop/N18-2071.mp4
Data
NLVR