Abstract
Accurate terminology translation is crucial for ensuring the practicality and reliability of neural machine translation (NMT) systems. To address this, lexically constrained NMT explores various methods to ensure pre-specified words and phrases appear in the translation output. However, in many cases, those methods are studied on general domain corpora, where the terms are mostly uni- and bi-grams (>98%). In this paper, we instead tackle a more challenging setup consisting of domain-specific corpora with much longer n-gram and highly specialized terms. Inspired by the recent success of masked span prediction models, we propose a simple and effective training strategy that achieves consistent improvements on both terminology and sentence-level translation for three domain-specific corpora in two language pairs.- Anthology ID:
- 2021.acl-short.94
- Volume:
- Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
- Month:
- August
- Year:
- 2021
- Address:
- Online
- Editors:
- Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
- Venues:
- ACL | IJCNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 743–753
- Language:
- URL:
- https://aclanthology.org/2021.acl-short.94
- DOI:
- 10.18653/v1/2021.acl-short.94
- Cite (ACL):
- Gyubok Lee, Seongjun Yang, and Edward Choi. 2021. Improving Lexically Constrained Neural Machine Translation with Source-Conditioned Masked Span Prediction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 743–753, Online. Association for Computational Linguistics.
- Cite (Informal):
- Improving Lexically Constrained Neural Machine Translation with Source-Conditioned Masked Span Prediction (Lee et al., ACL-IJCNLP 2021)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2021.acl-short.94.pdf
- Code
- wns823/NMT_SSP