Wenjie Zhong

2021

pdf abs
Leveraging Partial Dependency Trees to Control Image Captions
Wenjie Zhong | Yusuke Miyao
Proceedings of the Second Workshop on Advances in Language and Vision Research

Controlling the generation of image captions attracts lots of attention recently. In this paper, we propose a framework leveraging partial syntactic dependency trees as control signals to make image captions include specified words and their syntactic structures. To achieve this purpose, we propose a Syntactic Dependency Structure Aware Model (SDSAM), which explicitly learns to generate the syntactic structures of image captions to include given partial dependency trees. In addition, we come up with a metric to evaluate how many specified words and their syntactic dependencies are included in generated captions. We carry out experiments on two standard datasets: Microsoft COCO and Flickr30k. Empirical results show that image captions generated by our model are effectively controlled in terms of specified words and their syntactic structures.The code is available on GitHub.

Co-authors

Yusuke Miyao 1

Venues

alvr1