Encoder and Decoder, Not One Less for Pre-trained Language Model Sponsored NMT

Sufeng Duan, Hai Zhao


Abstract
Well pre-trained contextualized representations from pre-trained language model (PLM) have been shown helpful for enhancing various natural language processing tasks, surely including neural machine translation (NMT). However, existing methods either consider encoder-only enhancement or rely on specific multilingual PLMs, which leads to a much larger model or give up potentially helpful knowledge from target PLMs. In this paper, we propose a new monolingual PLM-sponsored NMT model to let both encoder and decoder enjoy PLM enhancement to alleviate such obvious inconvenience. Especially, incorporating a newly proposed frequency-weighted embedding transformation algorithm, PLM embeddings can be effectively exploited in terms of the representations of the NMT decoder. We evaluate our model on IWSLT14 En-De, De-En, WMT14 En-De, and En-Fr tasks, and the results show that our proposed PLM enhancement gives significant improvement and even helps achieve new state-of-the-art.
Anthology ID:
2023.findings-acl.222
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3602–3613
Language:
URL:
https://aclanthology.org/2023.findings-acl.222
DOI:
10.18653/v1/2023.findings-acl.222
Bibkey:
Cite (ACL):
Sufeng Duan and Hai Zhao. 2023. Encoder and Decoder, Not One Less for Pre-trained Language Model Sponsored NMT. In Findings of the Association for Computational Linguistics: ACL 2023, pages 3602–3613, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Encoder and Decoder, Not One Less for Pre-trained Language Model Sponsored NMT (Duan & Zhao, Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/dois-2013-emnlp/2023.findings-acl.222.pdf
Video:
 https://preview.aclanthology.org/dois-2013-emnlp/2023.findings-acl.222.mp4