The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation
Naman Goyal, Cynthia Gao, Vishrav Chaudhary, Peng-Jen Chen, Guillaume Wenzek, Da Ju, Sanjana Krishnan, Marc’Aurelio Ranzato, Francisco Guzmán, Angela Fan
Abstract
One of the biggest challenges hindering progress in low-resource and multilingual machine translation is the lack of good evaluation benchmarks. Current evaluation benchmarks either lack good coverage of low-resource languages, consider only restricted domains, or are low quality because they are constructed using semi-automatic procedures. In this work, we introduce the Flores-101 evaluation benchmark, consisting of 3001 sentences extracted from English Wikipedia and covering a variety of different topics and domains. These sentences have been translated in 101 languages by professional translators through a carefully controlled process. The resulting dataset enables better assessment of model quality on the long tail of low-resource languages, including the evaluation of many-to-many multilingual translation systems, as all translations are fully aligned. By publicly releasing such a high-quality and high-coverage dataset, we hope to foster progress in the machine translation community and beyond.- Anthology ID:
- 2022.tacl-1.30
- Volume:
- Transactions of the Association for Computational Linguistics, Volume 10
- Month:
- Year:
- 2022
- Address:
- Cambridge, MA
- Venue:
- TACL
- SIG:
- Publisher:
- MIT Press
- Note:
- Pages:
- 522–538
- Language:
- URL:
- https://aclanthology.org/2022.tacl-1.30
- DOI:
- 10.1162/tacl_a_00474
- Cite (ACL):
- Naman Goyal, Cynthia Gao, Vishrav Chaudhary, Peng-Jen Chen, Guillaume Wenzek, Da Ju, Sanjana Krishnan, Marc’Aurelio Ranzato, Francisco Guzmán, and Angela Fan. 2022. The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation. Transactions of the Association for Computational Linguistics, 10:522–538.
- Cite (Informal):
- The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation (Goyal et al., TACL 2022)
- PDF:
- https://preview.aclanthology.org/nodalida-main-page/2022.tacl-1.30.pdf