Low-resource Machine Translation: what for? who for? An observational study on a dedicated Tetun language translation service

Raphael Merx, Adérito José Guterres Correia, Hanna Suominen, Ekaterina Vylomova


Abstract
Low-resource machine translation (MT) presents a diversity of community needs and application challenges that remain poorly understood. To complement surveys and focus groups, which tend to rely on small samples of respondents, we propose an observational study on actual usage patterns of a specialized MT service for the Tetun language, which is the lingua franca in Timor-Leste. Our analysis of 100,000 translation requests reveals patterns that challenge assumptions based on existing corpora. We find that users, many of them students on mobile devices, typically translate text from a high-resource language into Tetun across diverse domains including science, healthcare, and daily life. This contrasts sharply with available Tetun corpora, which are dominated by news articles covering government and social issues.Our results suggest that MT systems for institutionalized minority languages like Tetun should prioritize accuracy on domains relevant to educational contexts, in the high-resource to low-resource direction. More broadly, this study demonstrates how observational analysis can inform low-resource language technology development, by grounding research in practical community needs.
Anthology ID:
2025.loresmt-1.7
Volume:
Proceedings of the Eighth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025)
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico, U.S.A.
Editors:
Atul Kr. Ojha, Chao-hong Liu, Ekaterina Vylomova, Flammie Pirinen, Jonathan Washington, Nathaniel Oco, Xiaobing Zhao
Venues:
LoResMT | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
54–65
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.loresmt-1.7/
DOI:
Bibkey:
Cite (ACL):
Raphael Merx, Adérito José Guterres Correia, Hanna Suominen, and Ekaterina Vylomova. 2025. Low-resource Machine Translation: what for? who for? An observational study on a dedicated Tetun language translation service. In Proceedings of the Eighth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025), pages 54–65, Albuquerque, New Mexico, U.S.A.. Association for Computational Linguistics.
Cite (Informal):
Low-resource Machine Translation: what for? who for? An observational study on a dedicated Tetun language translation service (Merx et al., LoResMT 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.loresmt-1.7.pdf