Bridge the Gap: High-level Semantic Planning for Image Captioning

Chenxi Yuan, Yang Bai, Chun Yuan


Abstract
Recent image captioning models have made much progress for exploring the multi-modal interaction, such as attention mechanisms. Though these mechanisms can boost the interaction, there are still two gaps between the visual and language domains: (1) the gap between the visual features and textual semantics, (2) the gap between the disordering of visual features and the ordering of texts. To bridge the gaps we propose a high-level semantic planning (HSP) mechanism that incorporates both a semantic reconstruction and an explicit order planning. We integrate the planning mechanism to the attention based caption model and propose the High-level Semantic PLanning based Attention Network (HS-PLAN). First, an attention based reconstruction module is designed to reconstruct the visual features with high-level semantic information. Then we apply a pointer network to serialize the features and obtain the explicit order plan to guide the generation. Experiments conducted on MS COCO show that our model outperforms previous methods and achieves the state-of-the-art performance of 133.4% CIDEr-D score.
Anthology ID:
2020.coling-main.281
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
3157–3167
Language:
URL:
https://aclanthology.org/2020.coling-main.281
DOI:
10.18653/v1/2020.coling-main.281
Bibkey:
Cite (ACL):
Chenxi Yuan, Yang Bai, and Chun Yuan. 2020. Bridge the Gap: High-level Semantic Planning for Image Captioning. In Proceedings of the 28th International Conference on Computational Linguistics, pages 3157–3167, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Bridge the Gap: High-level Semantic Planning for Image Captioning (Yuan et al., COLING 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2020.coling-main.281.pdf
Data
MS COCO