Abstract
FLORES is a benchmark dataset designed for evaluating machine translation systems, partic- ularly for low-resource languages. This paper, conducted as a part of Open Language Data Ini- tiative (OLDI) shared task, presents our contri- bution to expanding the FLORES dataset with high-quality translations from Russian to Tu- van, an endangered Turkic language. Our ap- proach combined the linguistic expertise of na- tive speakers to ensure both accuracy and cul- tural relevance in the translations. This project represents a significant step forward in support- ing Tuvan as a low-resource language in the realm of natural language processing (NLP) and machine translation (MT).- Anthology ID:
- 2024.wmt-1.46
- Volume:
- Proceedings of the Ninth Conference on Machine Translation
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
- Venue:
- WMT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 593–599
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/2024.wmt-1.46/
- DOI:
- 10.18653/v1/2024.wmt-1.46
- Cite (ACL):
- Ali Kuzhuget, Airana Mongush, and Nachyn-Enkhedorzhu Oorzhak. 2024. Enhancing Tuvan Language Resources through the FLORES Dataset. In Proceedings of the Ninth Conference on Machine Translation, pages 593–599, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Enhancing Tuvan Language Resources through the FLORES Dataset (Kuzhuget et al., WMT 2024)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/2024.wmt-1.46.pdf