Abstract
In multilingual neural machine translation, it has been shown that sharing a single translation model between multiple languages can achieve competitive performance, sometimes even leading to performance gains over bilingually trained models. However, these improvements are not uniform; often multilingual parameter sharing results in a decrease in accuracy due to translation models not being able to accommodate different languages in their limited parameter space. In this work, we examine parameter sharing techniques that strike a happy medium between full sharing and individual training, specifically focusing on the self-attentional Transformer model. We find that the full parameter sharing approach leads to increases in BLEU scores mainly when the target languages are from a similar language family. However, even in the case where target languages are from different families where full parameter sharing leads to a noticeable drop in BLEU scores, our proposed methods for partial sharing of parameters can lead to substantial improvements in translation accuracy.- Anthology ID:
- W18-6327
- Volume:
- Proceedings of the Third Conference on Machine Translation: Research Papers
- Month:
- October
- Year:
- 2018
- Address:
- Brussels, Belgium
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 261–271
- Language:
- URL:
- https://aclanthology.org/W18-6327
- DOI:
- 10.18653/v1/W18-6327
- Cite (ACL):
- Devendra Sachan and Graham Neubig. 2018. Parameter Sharing Methods for Multilingual Self-Attentional Translation Models. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 261–271, Brussels, Belgium. Association for Computational Linguistics.
- Cite (Informal):
- Parameter Sharing Methods for Multilingual Self-Attentional Translation Models (Sachan & Neubig, WMT 2018)
- PDF:
- https://preview.aclanthology.org/author-url/W18-6327.pdf
- Code
- DevSinghSachan/multilingual_nmt