Tao–Filipino Neural Machine Translation: Strategies for Ultra–Low-Resource Settings
Adrian Denzel Macayan, Luis Andrew Sunga Madridijo, Ellexandrei Esponilla, Zachary Mitchell Francisco
Abstract
Neural Machine Translation (NMT) performance degrades significantly in ultra-low resource settings, particularly for endangeredlanguages like Tao (Yami) which lack extensive parallel corpora. This study investigates strategies to bootstrap a Tao-Tagalog translation system using the NLLB-200 (600 million parameter) model under extremely limited supervision. We propose a multi-faceted approach combining domain-specific fine-tuning, synthetic data augmentation, and cross-lingual transfer learning. Specifically, we leverage the phylogenetic proximity of Ivatan, a related Batanic language, to pre-train the model, and utilize dictionary-based generation to construct synthetic conversational data. Our results demonstrate that transfer learning from Ivatan improves translation quality on in-domain religious texts, achieving a BLEU score of 34.85. Conversely, incorporating synthetic data enhances the model’s ability to generalize to conversational contexts, mitigating the domain bias often inherent in religious corpora. These findings highlight the effectiveness of exploiting linguistic typology and structured lexical resources to develop functional NMT systems for under-represented Austronesian languages.- Anthology ID:
- 2026.loresmt-1.2
- Volume:
- Proceedings for the Ninth Workshop on Technologies for Machine Translation of Low Resource Languages (LoResMT 2026)
- Month:
- March
- Year:
- 2026
- Address:
- Rabat, Morocco
- Editors:
- Atul Kr. Ojha, Chao-hong Liu, Ekaterina Vylomova, Flammie Pirinen, Jonathan Washington, Nathaniel Oco, Xiaobing Zhao
- Venues:
- LoResMT | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 27–36
- Language:
- URL:
- https://preview.aclanthology.org/manual-author-scripts/2026.loresmt-1.2/
- DOI:
- Cite (ACL):
- Adrian Denzel Macayan, Luis Andrew Sunga Madridijo, Ellexandrei Esponilla, and Zachary Mitchell Francisco. 2026. Tao–Filipino Neural Machine Translation: Strategies for Ultra–Low-Resource Settings. In Proceedings for the Ninth Workshop on Technologies for Machine Translation of Low Resource Languages (LoResMT 2026), pages 27–36, Rabat, Morocco. Association for Computational Linguistics.
- Cite (Informal):
- Tao–Filipino Neural Machine Translation: Strategies for Ultra–Low-Resource Settings (Macayan et al., LoResMT 2026)
- PDF:
- https://preview.aclanthology.org/manual-author-scripts/2026.loresmt-1.2.pdf