Learning to Generate Word- and Phrase-Embeddings for Efficient Phrase-Based Neural Machine Translation

Chan Young Park; Yulia Tsvetkov

doi:10.18653/v1/D19-5626

Learning to Generate Word- and Phrase-Embeddings for Efficient Phrase-Based Neural Machine Translation

Abstract

Neural machine translation (NMT) often fails in one-to-many translation, e.g., in the translation of multi-word expressions, compounds, and collocations. To improve the translation of phrases, phrase-based NMT systems have been proposed; these typically combine word-based NMT with external phrase dictionaries or with phrase tables from phrase-based statistical MT systems. These solutions introduce a significant overhead of additional resources and computational costs. In this paper, we introduce a phrase-based NMT model built upon continuous-output NMT, in which the decoder generates embeddings of words or phrases. The model uses a fertility module, which guides the decoder to generate embeddings of sequences of varying lengths. We show that our model learns to translate phrases better, performing on par with state of the art phrase-based NMT. Since our model does not resort to softmax computation over a huge vocabulary of phrases, its training time is about 112x faster than the baseline.

Anthology ID:: D19-5626
Volume:: Proceedings of the 3rd Workshop on Neural Generation and Translation
Month:: November
Year:: 2019
Address:: Hong Kong
Editors:: Alexandra Birch, Andrew Finch, Hiroaki Hayashi, Ioannis Konstas, Thang Luong, Graham Neubig, Yusuke Oda, Katsuhito Sudoh
Venue:: NGT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 241–248
Language:
URL:: https://preview.aclanthology.org/iwcs-25-ingestion/D19-5626/
DOI:: 10.18653/v1/D19-5626
Bibkey:
Cite (ACL):: Chan Young Park and Yulia Tsvetkov. 2019. Learning to Generate Word- and Phrase-Embeddings for Efficient Phrase-Based Neural Machine Translation. In Proceedings of the 3rd Workshop on Neural Generation and Translation, pages 241–248, Hong Kong. Association for Computational Linguistics.
Cite (Informal):: Learning to Generate Word- and Phrase-Embeddings for Efficient Phrase-Based Neural Machine Translation (Park & Tsvetkov, NGT 2019)
Copy Citation:
PDF:: https://preview.aclanthology.org/iwcs-25-ingestion/D19-5626.pdf

PDF Cite Search Fix data