The Source-Target Domain Mismatch Problem in Machine Translation
Jiajun Shen, Peng-Jen Chen, Matthew Le, Junxian He, Jiatao Gu, Myle Ott, Michael Auli, Marc’Aurelio Ranzato
Abstract
While we live in an increasingly interconnected world, different places still exhibit strikingly different cultures and many events we experience in our every day life pertain only to the specific place we live in. As a result, people often talk about different things in different parts of the world. In this work we study the effect of local context in machine translation and postulate that this causes the domains of the source and target language to greatly mismatch. We first formalize the concept of source-target domain mismatch, propose a metric to quantify it, and provide empirical evidence for its existence. We conclude with an empirical study of how source-target domain mismatch affects training of machine translation systems on low resource languages. While this may severely affect back-translation, the degradation can be alleviated by combining back-translation with self-training and by increasing the amount of target side monolingual data.- Anthology ID:
- 2021.eacl-main.130
- Volume:
- Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
- Month:
- April
- Year:
- 2021
- Address:
- Online
- Editors:
- Paola Merlo, Jorg Tiedemann, Reut Tsarfaty
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1519–1533
- Language:
- URL:
- https://aclanthology.org/2021.eacl-main.130
- DOI:
- 10.18653/v1/2021.eacl-main.130
- Cite (ACL):
- Jiajun Shen, Peng-Jen Chen, Matthew Le, Junxian He, Jiatao Gu, Myle Ott, Michael Auli, and Marc’Aurelio Ranzato. 2021. The Source-Target Domain Mismatch Problem in Machine Translation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 1519–1533, Online. Association for Computational Linguistics.
- Cite (Informal):
- The Source-Target Domain Mismatch Problem in Machine Translation (Shen et al., EACL 2021)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2021.eacl-main.130.pdf
- Data
- MTNT, OpenSubtitles