Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation

Zhuoyuan Mao; Raj Dabre; Qianying Liu; Haiyue Song; Chenhui Chu; Sadao Kurohashi

doi:10.18653/v1/2023.acl-short.112

Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation

Zhuoyuan Mao, Raj Dabre, Qianying Liu, Haiyue Song, Chenhui Chu, Sadao Kurohashi

Abstract

This paper studies the impact of layer normalization (LayerNorm) on zero-shot translation (ZST). Recent efforts for ZST often utilize the Transformer architecture as the backbone, with LayerNorm at the input of layers (PreNorm) set as the default. However, Xu et al. (2019) has revealed that PreNorm carries the risk of overfitting the training data. Based on this, we hypothesize that PreNorm may overfit supervised directions and thus have low generalizability for ZST. Through experiments on OPUS, IWSLT, and Europarl datasets for 54 ZST directions, we demonstrate that the original Transformer setting of LayerNorm after residual connections (PostNorm) consistently outperforms PreNorm by up to 12.3 BLEU points. We then study the performance disparities by analyzing the differences in off-target rates and structural variations between PreNorm and PostNorm. This study highlights the need for careful consideration of the LayerNorm setting for ZST.

Anthology ID:: 2023.acl-short.112
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1300–1316
Language:
URL:: https://aclanthology.org/2023.acl-short.112
DOI:: 10.18653/v1/2023.acl-short.112
Bibkey:
Cite (ACL):: Zhuoyuan Mao, Raj Dabre, Qianying Liu, Haiyue Song, Chenhui Chu, and Sadao Kurohashi. 2023. Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1300–1316, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation (Mao et al., ACL 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/emnlp-22-attachments/2023.acl-short.112.pdf
Video:: https://preview.aclanthology.org/emnlp-22-attachments/2023.acl-short.112.mp4

PDF Search Video