Multimodal Sentence Summarization via Multimodal Selective Encoding
Haoran Li, Junnan Zhu, Jiajun Zhang, Xiaodong He, Chengqing Zong
Abstract
This paper studies the problem of generating a summary for a given sentence-image pair. Existing multimodal sequence-to-sequence approaches mainly focus on enhancing the decoder by visual signals, while ignoring that the image can improve the ability of the encoder to identify highlights of a news event or a document. Thus, we propose a multimodal selective gate network that considers reciprocal relationships between textual and multi-level visual features, including global image descriptor, activation grids, and object proposals, to select highlights of the event when encoding the source sentence. In addition, we introduce a modality regularization to encourage the summary to capture the highlights embedded in the image more accurately. To verify the generalization of our model, we adopt the multimodal selective gate to the text-based decoder and multimodal-based decoder. Experimental results on a public multimodal sentence summarization dataset demonstrate the advantage of our models over baselines. Further analysis suggests that our proposed multimodal selective gate network can effectively select important information in the input sentence.- Anthology ID:
- 2020.coling-main.496
- Volume:
- Proceedings of the 28th International Conference on Computational Linguistics
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 5655–5667
- Language:
- URL:
- https://aclanthology.org/2020.coling-main.496
- DOI:
- 10.18653/v1/2020.coling-main.496
- Cite (ACL):
- Haoran Li, Junnan Zhu, Jiajun Zhang, Xiaodong He, and Chengqing Zong. 2020. Multimodal Sentence Summarization via Multimodal Selective Encoding. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5655–5667, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Cite (Informal):
- Multimodal Sentence Summarization via Multimodal Selective Encoding (Li et al., COLING 2020)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2020.coling-main.496.pdf