Learning Bilingual Projections of Embeddings for Vocabulary Expansion in Machine Translation

Pranava Swaroop Madhyastha; Cristina España-Bonet

doi:10.18653/v1/W17-2617

Learning Bilingual Projections of Embeddings for Vocabulary Expansion in Machine Translation

Pranava Swaroop Madhyastha, Cristina España-Bonet

Abstract

We propose a simple log-bilinear softmax-based model to deal with vocabulary expansion in machine translation. Our model uses word embeddings trained on significantly large unlabelled monolingual corpora and learns over a fairly small, word-to-word bilingual dictionary. Given an out-of-vocabulary source word, the model generates a probabilistic list of possible translations in the target language using the trained bilingual embeddings. We integrate these translation options into a standard phrase-based statistical machine translation system and obtain consistent improvements in translation quality on the English–Spanish language pair. When tested over an out-of-domain testset, we get a significant improvement of 3.9 BLEU points.

Anthology ID:: W17-2617
Volume:: Proceedings of the 2nd Workshop on Representation Learning for NLP
Month:: August
Year:: 2017
Address:: Vancouver, Canada
Venues:: RepL4NLP | WS
SIG:: SIGREP
Publisher:: Association for Computational Linguistics
Note:
Pages:: 139–145
Language:
URL:: https://aclanthology.org/W17-2617
DOI:: 10.18653/v1/W17-2617
Bibkey:
Cite (ACL):: Pranava Swaroop Madhyastha and Cristina España-Bonet. 2017. Learning Bilingual Projections of Embeddings for Vocabulary Expansion in Machine Translation. In Proceedings of the 2nd Workshop on Representation Learning for NLP, pages 139–145, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):: Learning Bilingual Projections of Embeddings for Vocabulary Expansion in Machine Translation (Madhyastha & España-Bonet, 2017)
Copy Citation:
PDF:: https://preview.aclanthology.org/update-css-js/W17-2617.pdf

PDF Cite Search