Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation

Mozhdeh Gheini; Xiang Ren; Jonathan May

doi:10.18653/v1/2021.emnlp-main.132

Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation

Abstract

We study the power of cross-attention in the Transformer architecture within the context of transfer learning for machine translation, and extend the findings of studies into cross-attention when training from scratch. We conduct a series of experiments through fine-tuning a translation model on data where either the source or target language has changed. These experiments reveal that fine-tuning only the cross-attention parameters is nearly as effective as fine-tuning all parameters (i.e., the entire translation model). We provide insights into why this is the case and observe that limiting fine-tuning in this manner yields cross-lingually aligned embeddings. The implications of this finding for researchers and practitioners include a mitigation of catastrophic forgetting, the potential for zero-shot translation, and the ability to extend machine translation models to several new language pairs with reduced parameter storage overhead.

Anthology ID:: 2021.emnlp-main.132
Volume:: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2021
Address:: Online and Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1754–1765
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2021.emnlp-main.132/
DOI:: 10.18653/v1/2021.emnlp-main.132
Bibkey:
Cite (ACL):: Mozhdeh Gheini, Xiang Ren, and Jonathan May. 2021. Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1754–1765, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation (Gheini et al., EMNLP 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2021.emnlp-main.132.pdf
Video:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2021.emnlp-main.132.mp4
Code: mgheini/xattn-transfer-for-mt

PDF Cite Search Code Video Fix data