Hierarchical Transformer for Multilingual Machine Translation

Albina Khusainova, Adil Khan, Adín Ramírez Rivera, Vitaly Romanov


Abstract
The choice of parameter sharing strategy in multilingual machine translation models determines how optimally parameter space is used and hence, directly influences ultimate translation quality. Inspired by linguistic trees that show the degree of relatedness between different languages, the new general approach to parameter sharing in multilingual machine translation was suggested recently. The main idea is to use these expert language hierarchies as a basis for multilingual architecture: the closer two languages are, the more parameters they share. In this work, we test this idea using the Transformer architecture and show that despite the success in previous work there are problems inherent to training such hierarchical models. We demonstrate that in case of carefully chosen training strategy the hierarchical architecture can outperform bilingual models and multilingual models with full parameter sharing.
Anthology ID:
2021.vardial-1.2
Volume:
Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
April
Year:
2021
Address:
Kiyv, Ukraine
Editors:
Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Yves Scherrer, Tommi Jauhiainen
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12–20
Language:
URL:
https://aclanthology.org/2021.vardial-1.2
DOI:
Bibkey:
Cite (ACL):
Albina Khusainova, Adil Khan, Adín Ramírez Rivera, and Vitaly Romanov. 2021. Hierarchical Transformer for Multilingual Machine Translation. In Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 12–20, Kiyv, Ukraine. Association for Computational Linguistics.
Cite (Informal):
Hierarchical Transformer for Multilingual Machine Translation (Khusainova et al., VarDial 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-3/2021.vardial-1.2.pdf