Low-Resource Translation as Language Modeling

Tucker Berckmann; Berkan Hiziroglu

Low-Resource Translation as Language Modeling

Abstract

We present our submission to the very low resource supervised machine translation task at WMT20. We use a decoder-only transformer architecture and formulate the translation task as language modeling. To address the low-resource aspect of the problem, we pretrain over a similar language parallel corpus. Then, we employ an intermediate back-translation step before fine-tuning. Finally, we present an analysis of the system’s performance.

Anthology ID:: 2020.wmt-1.127
Volume:: Proceedings of the Fifth Conference on Machine Translation
Month:: November
Year:: 2020
Address:: Online
Editors:: Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Yvette Graham, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri
Venue:: WMT
SIG:: SIGMT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1079–1083
Language:
URL:: https://aclanthology.org/2020.wmt-1.127
DOI:
Bibkey:
Cite (ACL):: Tucker Berckmann and Berkan Hiziroglu. 2020. Low-Resource Translation as Language Modeling. In Proceedings of the Fifth Conference on Machine Translation, pages 1079–1083, Online. Association for Computational Linguistics.
Cite (Informal):: Low-Resource Translation as Language Modeling (Berckmann & Hiziroglu, WMT 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/revert-3132-ingestion-checklist/2020.wmt-1.127.pdf
Video:: https://slideslive.com/38939598

PDF Search Video