A Probabilistic Model for Joint Learning of Word Embeddings from Texts and Images

Melissa Ailem; Bowen Zhang; Aurélien Bellet; Pascal Denis; Fei Sha

doi:10.18653/v1/D18-1177

A Probabilistic Model for Joint Learning of Word Embeddings from Texts and Images

Melissa Ailem, Bowen Zhang, Aurelien Bellet, Pascal Denis, Fei Sha

Abstract

Several recent studies have shown the benefits of combining language and perception to infer word embeddings. These multimodal approaches either simply combine pre-trained textual and visual representations (e.g. features extracted from convolutional neural networks), or use the latter to bias the learning of textual word embeddings. In this work, we propose a novel probabilistic model to formalize how linguistic and perceptual inputs can work in concert to explain the observed word-context pairs in a text corpus. Our approach learns textual and visual representations jointly: latent visual factors couple together a skip-gram model for co-occurrence in linguistic data and a generative latent variable model for visual data. Extensive experimental studies validate the proposed model. Concretely, on the tasks of assessing pairwise word similarity and image/caption retrieval, our approach attains equally competitive or stronger results when compared to other state-of-the-art multimodal models.

Anthology ID:: D18-1177
Volume:: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:: October-November
Year:: 2018
Address:: Brussels, Belgium
Editors:: Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:: EMNLP
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1478–1487
Language:
URL:: https://preview.aclanthology.org/add_missing_videos/D18-1177/
DOI:: 10.18653/v1/D18-1177
Bibkey:
Cite (ACL):: Melissa Ailem, Bowen Zhang, Aurelien Bellet, Pascal Denis, and Fei Sha. 2018. A Probabilistic Model for Joint Learning of Word Embeddings from Texts and Images. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1478–1487, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):: A Probabilistic Model for Joint Learning of Word Embeddings from Texts and Images (Ailem et al., EMNLP 2018)
Copy Citation:
PDF:: https://preview.aclanthology.org/add_missing_videos/D18-1177.pdf
Data: ImageNet

PDF Search Fix data