Tulun: Transparent and Adaptable Low-resource Machine Translation
Raphael Merx, Hanna Suominen, Lois Yinghui Hong, Nick Thieberger, Trevor Cohn, Ekaterina Vylomova
Abstract
Machine translation (MT) systems that support low-resource languages often struggle on specialized domains. While researchers have proposed various techniques for domain adaptation, these approaches typically require model fine-tuning, making them impractical for non-technical users and small organizations. To address this gap, we propose Tulun, a versatile solution for terminology-aware translation, combining neural MT with large language model (LLM)-based post-editing guided by existing glossaries and translation memories.Our open-source web-based platform enables users to easily create, edit, and leverage terminology resources, fostering a collaborative human-machine translation process that respects and incorporates domain expertise while increasing MT accuracy.Evaluations show effectiveness in both real-world and benchmark scenarios: on medical and disaster relief translation tasks for Tetun and Bislama, our system achieves improvements of 16.90-22.41 ChrF++ points over baseline MT systems. Across six low-resource languages on the FLORES dataset, Tulun outperforms both standalone MT and LLM approaches, achieving an average improvement of 2.8 ChrF++ points over NLLB-54B. Tulun is publicly accessible at https://bislama-trans.rapha.dev.- Anthology ID:
- 2025.acl-demo.13
- Volume:
- Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Pushkar Mishra, Smaranda Muresan, Tao Yu
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 129–139
- Language:
- URL:
- https://preview.aclanthology.org/ingestion-acl-25/2025.acl-demo.13/
- DOI:
- Cite (ACL):
- Raphael Merx, Hanna Suominen, Lois Yinghui Hong, Nick Thieberger, Trevor Cohn, and Ekaterina Vylomova. 2025. Tulun: Transparent and Adaptable Low-resource Machine Translation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 129–139, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Tulun: Transparent and Adaptable Low-resource Machine Translation (Merx et al., ACL 2025)
- PDF:
- https://preview.aclanthology.org/ingestion-acl-25/2025.acl-demo.13.pdf