Survey of Low-Resource Machine Translation
Barry Haddow, Rachel Bawden, Antonio Valerio Miceli Barone, Jindřich Helcl, Alexandra Birch
Abstract
We present a survey covering the state of the art in low-resource machine translation (MT) research. There are currently around 7,000 languages spoken in the world and almost all language pairs lack significant resources for training machine translation models. There has been increasing interest in research addressing the challenge of producing useful translation models when very little translated training data is available. We present a summary of this topical research field and provide a description of the techniques evaluated by researchers in several recent shared tasks in low-resource MT.- Anthology ID:
- 2022.cl-3.6
- Volume:
- Computational Linguistics, Volume 48, Issue 3 - September 2022
- Month:
- September
- Year:
- 2022
- Address:
- Cambridge, MA
- Venue:
- CL
- SIG:
- Publisher:
- MIT Press
- Note:
- Pages:
- 673–732
- Language:
- URL:
- https://aclanthology.org/2022.cl-3.6
- DOI:
- 10.1162/coli_a_00446
- Cite (ACL):
- Barry Haddow, Rachel Bawden, Antonio Valerio Miceli Barone, Jindřich Helcl, and Alexandra Birch. 2022. Survey of Low-Resource Machine Translation. Computational Linguistics, 48(3):673–732.
- Cite (Informal):
- Survey of Low-Resource Machine Translation (Haddow et al., CL 2022)
- PDF:
- https://preview.aclanthology.org/nodalida-main-page/2022.cl-3.6.pdf
- Data
- CC100, FLORES-101, FLoRes, Samanantar, Tatoeba