Abstract
In this paper, we address the task of news-image captioning, which generates a description of an image given the image and its article body as input. This task is more challenging than the conventional image captioning, because it requires a joint understanding of image and text. We present a Transformer model that integrates text and image modalities and attends to textual features from visual features in generating a caption. Experiments based on automatic evaluation metrics and human evaluation show that an article text provides primary information to reproduce news-image captions written by journalists. The results also demonstrate that the proposed model outperforms the state-of-the-art model. In addition, we also confirm that visual features contribute to improving the quality of news-image captions.- Anthology ID:
- 2020.coling-main.176
- Volume:
- Proceedings of the 28th International Conference on Computational Linguistics
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 1941–1951
- Language:
- URL:
- https://aclanthology.org/2020.coling-main.176
- DOI:
- 10.18653/v1/2020.coling-main.176
- Cite (ACL):
- Zhishen Yang and Naoaki Okazaki. 2020. Image Caption Generation for News Articles. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1941–1951, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Cite (Informal):
- Image Caption Generation for News Articles (Yang & Okazaki, COLING 2020)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2020.coling-main.176.pdf
- Code
- nlp-titech/news_image_captioning_for_news_articles
- Data
- Places