Train Global, Tailor Local: Minimalist Multilingual Translation into Endangered Languages

Zhong Zhou, Jan Niehues, Alexander Waibel


Abstract
In many humanitarian scenarios, translation into severely low resource languages often does not require a universal translation engine, but a dedicated text-specific translation engine. For example, healthcare records, hygienic procedures, government communication, emergency procedures and religious texts are all limited texts. While generic translation engines for all languages do not exist, translation of multilingually known limited texts into new, endangered languages may be possible and reduce human translation effort. We attempt to leverage translation resources from rich resource languages to efficiently produce best possible translation quality for well known texts, which is available in multiple languages, in a new, severely low resource language. We examine two approaches: 1.) best selection of seed sentences to jump start translations in a new language in view of best generalization to the remainder of a larger targeted text(s), and 2.) we adapt large general multilingual translation engines from many other languages to focus on a specific text in a new, unknown language. We find that adapting large pretrained multilingual models to the domain/text first and then to the severely low resource language works best. If we also select a best set of seed sentences, we can improve average chrF performance on new test languages from a baseline of 21.9 to 50.7, while reducing the number of seed sentences to only ∼1,000 in the new, unknown language.
Anthology ID:
2023.loresmt-1.1
Volume:
Proceedings of the The Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023)
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Atul Kr. Ojha, Chao-hong Liu, Ekaterina Vylomova, Flammie Pirinen, Jade Abbott, Jonathan Washington, Nathaniel Oco, Valentin Malykh, Varvara Logacheva, Xiaobing Zhao
Venue:
LoResMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–15
Language:
URL:
https://aclanthology.org/2023.loresmt-1.1
DOI:
10.18653/v1/2023.loresmt-1.1
Bibkey:
Cite (ACL):
Zhong Zhou, Jan Niehues, and Alexander Waibel. 2023. Train Global, Tailor Local: Minimalist Multilingual Translation into Endangered Languages. In Proceedings of the The Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023), pages 1–15, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Train Global, Tailor Local: Minimalist Multilingual Translation into Endangered Languages (Zhou et al., LoResMT 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-2023-videos/2023.loresmt-1.1.pdf
Video:
 https://preview.aclanthology.org/ingest-acl-2023-videos/2023.loresmt-1.1.mp4