Multi-grained Attention with Object-level Grounding for Visual Question Answering

Pingping Huang; Jianhui Huang; Yuqing Guo; Min Qiao; Yong Zhu

doi:10.18653/v1/P19-1349

Multi-grained Attention with Object-level Grounding for Visual Question Answering

Pingping Huang, Jianhui Huang, Yuqing Guo, Min Qiao, Yong Zhu

[How to correct problems with metadata yourself]

Abstract

Attention mechanisms are widely used in Visual Question Answering (VQA) to search for visual clues related to the question. Most approaches train attention models from a coarse-grained association between sentences and images, which tends to fail on small objects or uncommon concepts. To address this problem, this paper proposes a multi-grained attention method. It learns explicit word-object correspondence by two types of word-level attention complementary to the sentence-image association. Evaluated on the VQA benchmark, the multi-grained attention model achieves competitive performance with state-of-the-art models. And the visualized attention maps demonstrate that addition of object-level groundings leads to a better understanding of the images and locates the attended objects more precisely.

Anthology ID:: P19-1349
Volume:: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:: July
Year:: 2019
Address:: Florence, Italy
Editors:: Anna Korhonen, David Traum, Lluís Màrquez
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3595–3600
Language:
URL:: https://aclanthology.org/P19-1349
DOI:: 10.18653/v1/P19-1349
Bibkey:
Cite (ACL):: Pingping Huang, Jianhui Huang, Yuqing Guo, Min Qiao, and Yong Zhu. 2019. Multi-grained Attention with Object-level Grounding for Visual Question Answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3595–3600, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):: Multi-grained Attention with Object-level Grounding for Visual Question Answering (Huang et al., ACL 2019)
Copy Citation:
PDF:: https://preview.aclanthology.org/teach-a-man-to-fish/P19-1349.pdf
Video:: https://preview.aclanthology.org/teach-a-man-to-fish/P19-1349.mp4
Data: Visual Question Answering, Visual Question Answering v2.0

PDF Search Video