Abstract
Despite the reported success of unsupervised machine translation (MT), the field has yet to examine the conditions under which the methods succeed and fail. We conduct an extensive empirical evaluation using dissimilar language pairs, dissimilar domains, and diverse datasets. We find that performance rapidly deteriorates when source and target corpora are from different domains, and that stochasticity during embedding training can dramatically affect downstream results. We additionally find that unsupervised MT performance declines when source and target languages use different scripts, and observe very poor performance on authentic low-resource language pairs. We advocate for extensive empirical evaluation of unsupervised MT systems to highlight failure points and encourage continued research on the most promising paradigms. We release our preprocessed dataset to encourage evaluations that stress-test systems under multiple data conditions.- Anthology ID:
- 2020.wmt-1.68
- Volume:
- Proceedings of the Fifth Conference on Machine Translation
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 571–583
- Language:
- URL:
- https://aclanthology.org/2020.wmt-1.68
- DOI:
- Cite (ACL):
- Kelly Marchisio, Kevin Duh, and Philipp Koehn. 2020. When Does Unsupervised Machine Translation Work?. In Proceedings of the Fifth Conference on Machine Translation, pages 571–583, Online. Association for Computational Linguistics.
- Cite (Informal):
- When Does Unsupervised Machine Translation Work? (Marchisio et al., WMT 2020)
- PDF:
- https://preview.aclanthology.org/auto-file-uploads/2020.wmt-1.68.pdf