G-Transformer for Document-Level Machine Translation

Guangsheng Bao, Yue Zhang, Zhiyang Teng, Boxing Chen, Weihua Luo


Abstract
Document-level MT models are still far from satisfactory. Existing work extend translation unit from single sentence to multiple sentences. However, study shows that when we further enlarge the translation unit to a whole document, supervised training of Transformer can fail. In this paper, we find such failure is not caused by overfitting, but by sticking around local minima during training. Our analysis shows that the increased complexity of target-to-source attention is a reason for the failure. As a solution, we propose G-Transformer, introducing locality assumption as an inductive bias into Transformer, reducing the hypothesis space of the attention from target to source. Experiments show that G-Transformer converges faster and more stably than Transformer, achieving new state-of-the-art BLEU scores for both nonpretraining and pre-training settings on three benchmark datasets.
Anthology ID:
2021.acl-long.267
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
August
Year:
2021
Address:
Online
Editors:
Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3442–3455
Language:
URL:
https://aclanthology.org/2021.acl-long.267
DOI:
10.18653/v1/2021.acl-long.267
Bibkey:
Cite (ACL):
Guangsheng Bao, Yue Zhang, Zhiyang Teng, Boxing Chen, and Weihua Luo. 2021. G-Transformer for Document-Level Machine Translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3442–3455, Online. Association for Computational Linguistics.
Cite (Informal):
G-Transformer for Document-Level Machine Translation (Bao et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/dois-2013-emnlp/2021.acl-long.267.pdf
Video:
 https://preview.aclanthology.org/dois-2013-emnlp/2021.acl-long.267.mp4
Code
 baoguangsheng/g-transformer
Data
Europarl