Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer
Gi-Cheon Kang, Junseok Park, Hwaran Lee, Byoung-Tak Zhang, Jin-Hwa Kim
Abstract
Visual dialog is a task of answering a sequence of questions grounded in an image using the previous dialog history as context. In this paper, we study how to address two fundamental challenges for this task: (1) reasoning over underlying semantic structures among dialog rounds and (2) identifying several appropriate answers to the given question. To address these challenges, we propose a Sparse Graph Learning (SGL) method to formulate visual dialog as a graph structure learning task. SGL infers inherently sparse dialog structures by incorporating binary and score edges and leveraging a new structural loss function. Next, we introduce a Knowledge Transfer (KT) method that extracts the answer predictions from the teacher model and uses them as pseudo labels. We propose KT to remedy the shortcomings of single ground-truth labels, which severely limit the ability of a model to obtain multiple reasonable answers. As a result, our proposed model significantly improves reasoning capability compared to baseline methods and outperforms the state-of-the-art approaches on the VisDial v1.0 dataset. The source code is available at https://github.com/gicheonkang/SGLKT-VisDial.- Anthology ID:
- 2021.findings-emnlp.31
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2021
- Month:
- November
- Year:
- 2021
- Address:
- Punta Cana, Dominican Republic
- Venue:
- Findings
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 327–339
- Language:
- URL:
- https://aclanthology.org/2021.findings-emnlp.31
- DOI:
- 10.18653/v1/2021.findings-emnlp.31
- Cite (ACL):
- Gi-Cheon Kang, Junseok Park, Hwaran Lee, Byoung-Tak Zhang, and Jin-Hwa Kim. 2021. Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 327–339, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer (Kang et al., Findings 2021)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2021.findings-emnlp.31.pdf
- Code
- gicheonkang/sglkt-visdial
- Data
- VisDial