Abstract
Back-translation is a data augmentation technique that has been shown to improve model quality through the creation of synthetic training bitext. Early studies showed the promise of the technique and follow on studies have produced additional refinements. We have undertaken a broad investigation using back-translation to train models from 60 languages into English; the majority of these languages are considered moderate- or low-resource languages. We observed consistent gains, though compared to prior work we saw conspicuous gains in quite a number of lower-resourced languages. We analyzed differences in translations between baseline and back-translation models, and observed many indications of improved translation quality. Translation of both rare and common terms is improved, and these improvements occur despite the less natural synthetic source-language text used in training.- Anthology ID:
- 2023.findings-acl.518
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2023
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 8166–8183
- Language:
- URL:
- https://aclanthology.org/2023.findings-acl.518
- DOI:
- 10.18653/v1/2023.findings-acl.518
- Cite (ACL):
- Paul McNamee and Kevin Duh. 2023. An Extensive Exploration of Back-Translation in 60 Languages. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8166–8183, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- An Extensive Exploration of Back-Translation in 60 Languages (McNamee & Duh, Findings 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2023.findings-acl.518.pdf