Context-Interactive Pre-Training for Document Machine Translation

Pengcheng Yang, Pei Zhang, Boxing Chen, Jun Xie, Weihua Luo


Abstract
Document machine translation aims to translate the source sentence into the target language in the presence of additional contextual information. However, it typically suffers from a lack of doc-level bilingual data. To remedy this, here we propose a simple yet effective context-interactive pre-training approach, which targets benefiting from external large-scale corpora. The proposed model performs inter sentence generation to capture the cross-sentence dependency within the target document, and cross sentence translation to make better use of valuable contextual information. Comprehensive experiments illustrate that our approach can achieve state-of-the-art performance on three benchmark datasets, which significantly outperforms a variety of baselines.
Anthology ID:
2021.naacl-main.281
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3589–3595
Language:
URL:
https://aclanthology.org/2021.naacl-main.281
DOI:
10.18653/v1/2021.naacl-main.281
Bibkey:
Cite (ACL):
Pengcheng Yang, Pei Zhang, Boxing Chen, Jun Xie, and Weihua Luo. 2021. Context-Interactive Pre-Training for Document Machine Translation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3589–3595, Online. Association for Computational Linguistics.
Cite (Informal):
Context-Interactive Pre-Training for Document Machine Translation (Yang et al., NAACL 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2021.naacl-main.281.pdf