L2C: Describing Visual Differences Needs Semantic Understanding of Individuals

An Yan; Xin Wang; Tsu-Jui Fu; William Yang Wang

doi:10.18653/v1/2021.eacl-main.196

L2C: Describing Visual Differences Needs Semantic Understanding of Individuals

An Yan, Xin Wang, Tsu-Jui Fu, William Yang Wang

Abstract

Recent advances in language and vision push forward the research of captioning a single image to describing visual differences between image pairs. Suppose there are two images, I_1 and I_2, and the task is to generate a description W_1,2 comparing them, existing methods directly model I_1, I_2 -> W_1,2 mapping without the semantic understanding of individuals. In this paper, we introduce a Learning-to-Compare (L2C) model, which learns to understand the semantic structures of these two images and compare them while learning to describe each one. We demonstrate that L2C benefits from a comparison between explicit semantic representations and single-image captions, and generalizes better on the new testing image pairs. It outperforms the baseline on both automatic evaluation and human evaluation for the Birds-to-Words dataset.

Anthology ID:: 2021.eacl-main.196
Volume:: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Month:: April
Year:: 2021
Address:: Online
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2315–2320
Language:
URL:: https://aclanthology.org/2021.eacl-main.196
DOI:: 10.18653/v1/2021.eacl-main.196
Bibkey:
Cite (ACL):: An Yan, Xin Wang, Tsu-Jui Fu, and William Yang Wang. 2021. L2C: Describing Visual Differences Needs Semantic Understanding of Individuals. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2315–2320, Online. Association for Computational Linguistics.
Cite (Informal):: L2C: Describing Visual Differences Needs Semantic Understanding of Individuals (Yan et al., EACL 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/update-css-js/2021.eacl-main.196.pdf
Data: CUB-200-2011

PDF Cite Search