Kakao Enterprise’s WMT21 Machine Translation Using Terminologies Task Submission

Yunju Bak, Jimin Sun, Jay Kim, Sungwon Lyu, Changmin Lee


Abstract
This paper describes Kakao Enterprise’s submission to the WMT21 shared Machine Translation using Terminologies task. We integrate terminology constraints by pre-training with target lemma annotations and fine-tuning with exact target annotations utilizing the given terminology dataset. This approach yields a model that achieves outstanding results in terms of both translation quality and term consistency, ranking first based on COMET in the En→Fr language direction. Furthermore, we explore various methods such as back-translation, explicitly training terminologies as additional parallel data, and in-domain data selection.
Anthology ID:
2021.wmt-1.79
Volume:
Proceedings of the Sixth Conference on Machine Translation
Month:
November
Year:
2021
Address:
Online
Venues:
EMNLP | WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
804–812
Language:
URL:
https://aclanthology.org/2021.wmt-1.79
DOI:
Bibkey:
Cite (ACL):
Yunju Bak, Jimin Sun, Jay Kim, Sungwon Lyu, and Changmin Lee. 2021. Kakao Enterprise’s WMT21 Machine Translation Using Terminologies Task Submission. In Proceedings of the Sixth Conference on Machine Translation, pages 804–812, Online. Association for Computational Linguistics.
Cite (Informal):
Kakao Enterprise’s WMT21 Machine Translation Using Terminologies Task Submission (Bak et al., WMT 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2021.wmt-1.79.pdf