DaLC: Domain Adaptation Learning Curve Prediction for Neural Machine Translation

Cheonbok Park, Hantae Kim, Ioan Calapodescu, Hyun Chang Cho, Vassilina Nikoulina


Abstract
Domain Adaptation (DA) of Neural Machine Translation (NMT) model often relies on a pre-trained general NMT model which is adapted to the new domain on a sample of in-domain parallel data. Without parallel data, there is no way to estimate the potential benefit of DA, nor the amount of parallel samples it would require. It is however a desirable functionality that could help MT practitioners to make an informed decision before investing resources in dataset creation. We propose a Domain adaptation Learning Curve prediction (DaLC) model that predicts prospective DA performance based on in-domain monolingual samples in the source language. Our model relies on the NMT encoder representations combined with various instance and corpus-level features. We demonstrate that instance-level is better able to distinguish between different domains compared to corpus-level frameworks proposed in previous studies Finally, we perform in-depth analyses of the results highlighting the limitations of our approach, and provide directions for future research.
Anthology ID:
2022.findings-acl.141
Volume:
Findings of the Association for Computational Linguistics: ACL 2022
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1789–1807
Language:
URL:
https://aclanthology.org/2022.findings-acl.141
DOI:
10.18653/v1/2022.findings-acl.141
Bibkey:
Cite (ACL):
Cheonbok Park, Hantae Kim, Ioan Calapodescu, Hyun Chang Cho, and Vassilina Nikoulina. 2022. DaLC: Domain Adaptation Learning Curve Prediction for Neural Machine Translation. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1789–1807, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
DaLC: Domain Adaptation Learning Curve Prediction for Neural Machine Translation (Park et al., Findings 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.findings-acl.141.pdf