Abstract
Current image captioning approaches generate descriptions which lack specific information, such as named entities that are involved in the images. In this paper we propose a new task which aims to generate informative image captions, given images and hashtags as input. We propose a simple but effective approach to tackle this problem. We first train a convolutional neural networks - long short term memory networks (CNN-LSTM) model to generate a template caption based on the input image. Then we use a knowledge graph based collective inference algorithm to fill in the template with specific named entities retrieved via the hashtags. Experiments on a new benchmark dataset collected from Flickr show that our model generates news-style image descriptions with much richer information. Our model outperforms unimodal baselines significantly with various evaluation metrics.- Anthology ID:
- D18-1435
- Volume:
- Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
- Month:
- October-November
- Year:
- 2018
- Address:
- Brussels, Belgium
- Editors:
- Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
- Venue:
- EMNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4013–4023
- Language:
- URL:
- https://aclanthology.org/D18-1435
- DOI:
- 10.18653/v1/D18-1435
- Cite (ACL):
- Di Lu, Spencer Whitehead, Lifu Huang, Heng Ji, and Shih-Fu Chang. 2018. Entity-aware Image Caption Generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4013–4023, Brussels, Belgium. Association for Computational Linguistics.
- Cite (Informal):
- Entity-aware Image Caption Generation (Lu et al., EMNLP 2018)
- PDF:
- https://preview.aclanthology.org/ingest-bitext-workshop/D18-1435.pdf
- Data
- DBpedia, MS COCO