Learning to Describe Differences Between Pairs of Similar Images
Abstract
In this paper, we introduce the task of automatically generating text to describe the differences between two similar images. We collect a new dataset by crowd-sourcing difference descriptions for pairs of image frames extracted from video-surveillance footage. Annotators were asked to succinctly describe all the differences in a short paragraph. As a result, our novel dataset provides an opportunity to explore models that align language and vision, and capture visual salience. The dataset may also be a useful benchmark for coherent multi-sentence generation. We perform a first-pass visual analysis that exposes clusters of differing pixels as a proxy for object-level differences. We propose a model that captures visual salience by using a latent variable to align clusters of differing pixels with output sentences. We find that, for both single-sentence generation and as well as multi-sentence generation, the proposed model outperforms the models that use attention alone.- Anthology ID:
- D18-1436
- Volume:
- Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
- Month:
- October-November
- Year:
- 2018
- Address:
- Brussels, Belgium
- Editors:
- Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
- Venue:
- EMNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4024–4034
- Language:
- URL:
- https://aclanthology.org/D18-1436
- DOI:
- 10.18653/v1/D18-1436
- Cite (ACL):
- Harsh Jhamtani and Taylor Berg-Kirkpatrick. 2018. Learning to Describe Differences Between Pairs of Similar Images. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4024–4034, Brussels, Belgium. Association for Computational Linguistics.
- Cite (Informal):
- Learning to Describe Differences Between Pairs of Similar Images (Jhamtani & Berg-Kirkpatrick, EMNLP 2018)
- PDF:
- https://preview.aclanthology.org/teach-a-man-to-fish/D18-1436.pdf
- Code
- harsh19/spot-the-diff
- Data
- Spot-the-diff