Abstract
In image captioning, multiple captions are often provided as ground truths, since a valid caption is not always uniquely determined. Conventional methods randomly select a single caption and treat it as correct, but there have been few effective training methods that utilize multiple given captions. In this paper, we proposed two training technique for making effective use of multiple reference captions: 1) validity-based caption sampling (VBCS), which prioritizes the use of captions that are estimated to be highly valid during training, and 2) weighted caption smoothing (WCS), which applies smoothing only to the relevant words the reference caption to reflect multiple reference captions simultaneously. Experiments show that our proposed methods improve CIDEr by 2.6 points and BLEU4 by 0.9 points from baseline on the MSCOCO dataset.- Anthology ID:
- 2021.maiworkshop-1.6
- Volume:
- Proceedings of the Third Workshop on Multimodal Artificial Intelligence
- Month:
- June
- Year:
- 2021
- Address:
- Mexico City, Mexico
- Editors:
- Amir Zadeh, Louis-Philippe Morency, Paul Pu Liang, Candace Ross, Ruslan Salakhutdinov, Soujanya Poria, Erik Cambria, Kelly Shi
- Venue:
- maiworkshop
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 36–41
- Language:
- URL:
- https://preview.aclanthology.org/icon-24-ingestion/2021.maiworkshop-1.6/
- DOI:
- 10.18653/v1/2021.maiworkshop-1.6
- Cite (ACL):
- Shunta Nagasawa, Yotaro Watanabe, and Hitoshi Iyatomi. 2021. Validity-Based Sampling and Smoothing Methods for Multiple Reference Image Captioning. In Proceedings of the Third Workshop on Multimodal Artificial Intelligence, pages 36–41, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Validity-Based Sampling and Smoothing Methods for Multiple Reference Image Captioning (Nagasawa et al., maiworkshop 2021)
- PDF:
- https://preview.aclanthology.org/icon-24-ingestion/2021.maiworkshop-1.6.pdf
- Data
- MS COCO