Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning
Ukyo Honda, Yoshitaka Ushiku, Atsushi Hashimoto, Taro Watanabe, Yuji Matsumoto
Abstract
Unsupervised image captioning is a challenging task that aims at generating captions without the supervision of image-sentence pairs, but only with images and sentences drawn from different sources and object labels detected from the images. In previous work, pseudo-captions, i.e., sentences that contain the detected object labels, were assigned to a given image. The focus of the previous work was on the alignment of input images and pseudo-captions at the sentence level. However, pseudo-captions contain many words that are irrelevant to a given image. In this work, we investigate the effect of removing mismatched words from image-sentence alignment to determine how they make this task difficult. We propose a simple gating mechanism that is trained to align image features with only the most reliable words in pseudo-captions: the detected object labels. The experimental results show that our proposed method outperforms the previous methods without introducing complex sentence-level learning objectives. Combined with the sentence-level alignment method of previous work, our method further improves its performance. These results confirm the importance of careful alignment in word-level details.- Anthology ID:
- 2021.eacl-main.323
- Volume:
- Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
- Month:
- April
- Year:
- 2021
- Address:
- Online
- Editors:
- Paola Merlo, Jorg Tiedemann, Reut Tsarfaty
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3692–3702
- Language:
- URL:
- https://aclanthology.org/2021.eacl-main.323
- DOI:
- 10.18653/v1/2021.eacl-main.323
- Cite (ACL):
- Ukyo Honda, Yoshitaka Ushiku, Atsushi Hashimoto, Taro Watanabe, and Yuji Matsumoto. 2021. Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 3692–3702, Online. Association for Computational Linguistics.
- Cite (Informal):
- Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning (Honda et al., EACL 2021)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2021.eacl-main.323.pdf
- Code
- ukyh/RemovingSpuriousAlignment
- Data
- MS COCO