Zero-Resource Neural Machine Translation with Monolingual Pivot Data

Anna Currey; Kenneth Heafield

doi:10.18653/v1/D19-5610

Zero-Resource Neural Machine Translation with Monolingual Pivot Data

Abstract

Zero-shot neural machine translation (NMT) is a framework that uses source-pivot and target-pivot parallel data to train a source-target NMT system. An extension to zero-shot NMT is zero-resource NMT, which generates pseudo-parallel corpora using a zero-shot system and further trains the zero-shot system on that data. In this paper, we expand on zero-resource NMT by incorporating monolingual data in the pivot language into training; since the pivot language is usually the highest-resource language of the three, we expect monolingual pivot-language data to be most abundant. We propose methods for generating pseudo-parallel corpora using pivot-language monolingual data and for leveraging the pseudo-parallel corpora to improve the zero-shot NMT system. We evaluate these methods for a high-resource language pair (German-Russian) using English as the pivot. We show that our proposed methods yield consistent improvements over strong zero-shot and zero-resource baselines and even catch up to pivot-based models in BLEU (while not requiring the two-pass inference that pivot models require).

Anthology ID:: D19-5610
Volume:: Proceedings of the 3rd Workshop on Neural Generation and Translation
Month:: November
Year:: 2019
Address:: Hong Kong
Editors:: Alexandra Birch, Andrew Finch, Hiroaki Hayashi, Ioannis Konstas, Thang Luong, Graham Neubig, Yusuke Oda, Katsuhito Sudoh
Venue:: NGT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 99–107
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/D19-5610/
DOI:: 10.18653/v1/D19-5610
Bibkey:
Cite (ACL):: Anna Currey and Kenneth Heafield. 2019. Zero-Resource Neural Machine Translation with Monolingual Pivot Data. In Proceedings of the 3rd Workshop on Neural Generation and Translation, pages 99–107, Hong Kong. Association for Computational Linguistics.
Cite (Informal):: Zero-Resource Neural Machine Translation with Monolingual Pivot Data (Currey & Heafield, NGT 2019)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/D19-5610.pdf

PDF Cite Search Fix data