Improving Embedding Transfer for Low-Resource Machine Translation

Van-Hien Tran; Chenchen Ding; Hideki Tanaka; Masao Utiyama

Improving Embedding Transfer for Low-Resource Machine Translation

Van Hien Tran, Chenchen Ding, Hideki Tanaka, Masao Utiyama

Abstract

Low-resource machine translation (LRMT) poses a substantial challenge due to the scarcity of parallel training data. This paper introduces a new method to improve the transfer of the embedding layer from the Parent model to the Child model in LRMT, utilizing trained token embeddings in the Parent model’s high-resource vocabulary. Our approach involves projecting all tokens into a shared semantic space and measuring the semantic similarity between tokens in the low-resource and high-resource languages. These measures are then utilized to initialize token representations in the Child model’s low-resource vocabulary. We evaluated our approach on three benchmark datasets of low-resource language pairs: Myanmar-English, Indonesian-English, and Turkish-English. The experimental results demonstrate that our method outperforms previous methods regarding translation quality. Additionally, our approach is computationally efficient, leading to reduced training time compared to prior works.

Anthology ID:: 2023.mtsummit-research.11
Volume:: Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track
Month:: September
Year:: 2023
Address:: Macau SAR, China
Editors:: Masao Utiyama, Rui Wang
Venue:: MTSummit
SIG:
Publisher:: Asia-Pacific Association for Machine Translation
Note:
Pages:: 123–134
Language:
URL:: https://aclanthology.org/2023.mtsummit-research.11
DOI:
Bibkey:
Cite (ACL):: Van Hien Tran, Chenchen Ding, Hideki Tanaka, and Masao Utiyama. 2023. Improving Embedding Transfer for Low-Resource Machine Translation. In Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track, pages 123–134, Macau SAR, China. Asia-Pacific Association for Machine Translation.
Cite (Informal):: Improving Embedding Transfer for Low-Resource Machine Translation (Tran et al., MTSummit 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/proper-vol2-ingestion/2023.mtsummit-research.11.pdf

PDF Search