Incremental Domain Adaptation for Neural Machine Translation in Low-Resource Settings

Marimuthu Kalimuthu, Michael Barz, Daniel Sonntag


Abstract
We study the problem of incremental domain adaptation of a generic neural machine translation model with limited resources (e.g., budget and time) for human translations or model training. In this paper, we propose a novel query strategy for selecting “unlabeled” samples from a new domain based on sentence embeddings for Arabic. We accelerate the fine-tuning process of the generic model to the target domain. Specifically, our approach estimates the informativeness of instances from the target domain by comparing the distance of their sentence embeddings to embeddings from the generic domain. We perform machine translation experiments (Ar-to-En direction) for comparing a random sampling baseline with our new approach, similar to active learning, using two small update sets for simulating the work of human translators. For the prescribed setting we can save more than 50% of the annotation costs without loss in quality, demonstrating the effectiveness of our approach.
Anthology ID:
W19-4601
Volume:
Proceedings of the Fourth Arabic Natural Language Processing Workshop
Month:
August
Year:
2019
Address:
Florence, Italy
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–10
Language:
URL:
https://aclanthology.org/W19-4601
DOI:
10.18653/v1/W19-4601
Bibkey:
Cite (ACL):
Marimuthu Kalimuthu, Michael Barz, and Daniel Sonntag. 2019. Incremental Domain Adaptation for Neural Machine Translation in Low-Resource Settings. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, pages 1–10, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Incremental Domain Adaptation for Neural Machine Translation in Low-Resource Settings (Kalimuthu et al., WANLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/W19-4601.pdf
Code
 DFKI-Interactive-Machine-Learning/AraSIF