Thomas Vadora
2021
Region under Discussion for visual dialog
Mauricio Mazuecos
|
Franco M. Luque
|
Jorge Sánchez
|
Hernán Maina
|
Thomas Vadora
|
Luciana Benotti
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Visual Dialog is assumed to require the dialog history to generate correct responses during a dialog. However, it is not clear from previous work how dialog history is needed for visual dialog. In this paper we define what it means for a visual question to require dialog history and we release a subset of the Guesswhat?! questions for which their dialog history completely changes their responses. We propose a novel interpretable representation that visually grounds dialog history: the Region under Discussion. It constrains the image’s spatial features according to a semantic representation of the history inspired by the information structure notion of Question under Discussion.We evaluate the architecture on task-specific multimodal models and the visual transformer model LXMERT.
Search