When Scripts Diverge: Strengthening Low-Resource Neural Machine Translation Through Phonetic Cross-Lingual Transfer

Ammon Shurtz, Christian Richardson, Stephen D. Richardson


Abstract
Multilingual Neural Machine Translation (MNMT) models enhance translation quality for low-resource languages by exploiting cross-lingual similarities during training—a process known as knowledge transfer. This transfer is particularly effective between languages that share lexical or structural features, often enabled by a common orthography. However, languages with strong phonetic and lexical similarities but distinct writing systems experience limited benefits, as the absence of a shared orthography hinders knowledge transfer. To address this limitation, we propose an approach based on phonetic information that enhances token-level alignment across scripts by leveraging transliterations. We systematically evaluate several phonetic transcription techniques and strategies for incorporating phonetic information into NMT models. Our results show that using a shared encoder to process orthographic and phonetic inputs separately consistently yields the best performance for Khmer, Thai, and Lao in both directions with English, and that our custom Cognate-Aware Transliteration (CAT) method consistently improves translation quality over the baseline.
Anthology ID:
2025.mrl-main.22
Volume:
Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025)
Month:
November
Year:
2025
Address:
Suzhuo, China
Editors:
David Ifeoluwa Adelani, Catherine Arnett, Duygu Ataman, Tyler A. Chang, Hila Gonen, Rahul Raja, Fabian Schmidt, David Stap, Jiayi Wang
Venues:
MRL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
336–346
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.mrl-main.22/
DOI:
Bibkey:
Cite (ACL):
Ammon Shurtz, Christian Richardson, and Stephen D. Richardson. 2025. When Scripts Diverge: Strengthening Low-Resource Neural Machine Translation Through Phonetic Cross-Lingual Transfer. In Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025), pages 336–346, Suzhuo, China. Association for Computational Linguistics.
Cite (Informal):
When Scripts Diverge: Strengthening Low-Resource Neural Machine Translation Through Phonetic Cross-Lingual Transfer (Shurtz et al., MRL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.mrl-main.22.pdf