Phrase-based Unsupervised Machine Translation with Compositional Phrase Embeddings

Maksym Del, Andre Tättar, Mark Fishel


Abstract
This paper describes the University of Tartu’s submission to the unsupervised machine translation track of WMT18 news translation shared task. We build several baseline translation systems for both directions of the English-Estonian language pair using monolingual data only; the systems belong to the phrase-based unsupervised machine translation paradigm where we experimented with phrase lengths of up to 3. As a main contribution, we performed a set of standalone experiments with compositional phrase embeddings as a substitute for phrases as individual vocabulary entries. Results show that reasonable n-gram vectors can be obtained by simply summing up individual word vectors which retains or improves the performance of phrase-based unsupervised machine tranlation systems while avoiding limitations of atomic phrase vectors.
Anthology ID:
W18-6407
Volume:
Proceedings of the Third Conference on Machine Translation: Shared Task Papers
Month:
October
Year:
2018
Address:
Belgium, Brussels
Editors:
Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, Karin Verspoor
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
361–367
Language:
URL:
https://aclanthology.org/W18-6407
DOI:
10.18653/v1/W18-6407
Bibkey:
Cite (ACL):
Maksym Del, Andre Tättar, and Mark Fishel. 2018. Phrase-based Unsupervised Machine Translation with Compositional Phrase Embeddings. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 361–367, Belgium, Brussels. Association for Computational Linguistics.
Cite (Informal):
Phrase-based Unsupervised Machine Translation with Compositional Phrase Embeddings (Del et al., WMT 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/dois-2013-emnlp/W18-6407.pdf