Parameter Sharing Methods for Multilingual Self-Attentional Translation Models

Devendra Sachan; Graham Neubig

doi:10.18653/v1/W18-6327

Parameter Sharing Methods for Multilingual Self-Attentional Translation Models

Abstract

In multilingual neural machine translation, it has been shown that sharing a single translation model between multiple languages can achieve competitive performance, sometimes even leading to performance gains over bilingually trained models. However, these improvements are not uniform; often multilingual parameter sharing results in a decrease in accuracy due to translation models not being able to accommodate different languages in their limited parameter space. In this work, we examine parameter sharing techniques that strike a happy medium between full sharing and individual training, specifically focusing on the self-attentional Transformer model. We find that the full parameter sharing approach leads to increases in BLEU scores mainly when the target languages are from a similar language family. However, even in the case where target languages are from different families where full parameter sharing leads to a noticeable drop in BLEU scores, our proposed methods for partial sharing of parameters can lead to substantial improvements in translation accuracy.

Anthology ID:: W18-6327
Volume:: Proceedings of the Third Conference on Machine Translation: Research Papers
Month:: October
Year:: 2018
Address:: Brussels, Belgium
Venues:: EMNLP | WMT | WS
SIG:: SIGMT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 261–271
Language:
URL:: https://aclanthology.org/W18-6327
DOI:: 10.18653/v1/W18-6327
Bibkey:
Cite (ACL):: Devendra Sachan and Graham Neubig. 2018. Parameter Sharing Methods for Multilingual Self-Attentional Translation Models. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 261–271, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):: Parameter Sharing Methods for Multilingual Self-Attentional Translation Models (Sachan & Neubig, 2018)
Copy Citation:
PDF:: https://preview.aclanthology.org/update-css-js/W18-6327.pdf
Code: DevSinghSachan/multilingual_nmt

PDF Cite Search Code