DaLC: Domain Adaptation Learning Curve Prediction for Neural Machine Translation
Cheonbok Park, Hantae Kim, Ioan Calapodescu, Hyun Chang Cho, Vassilina Nikoulina
Abstract
Domain Adaptation (DA) of Neural Machine Translation (NMT) model often relies on a pre-trained general NMT model which is adapted to the new domain on a sample of in-domain parallel data. Without parallel data, there is no way to estimate the potential benefit of DA, nor the amount of parallel samples it would require. It is however a desirable functionality that could help MT practitioners to make an informed decision before investing resources in dataset creation. We propose a Domain adaptation Learning Curve prediction (DaLC) model that predicts prospective DA performance based on in-domain monolingual samples in the source language. Our model relies on the NMT encoder representations combined with various instance and corpus-level features. We demonstrate that instance-level is better able to distinguish between different domains compared to corpus-level frameworks proposed in previous studies Finally, we perform in-depth analyses of the results highlighting the limitations of our approach, and provide directions for future research.- Anthology ID:
- 2022.findings-acl.141
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2022
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland
- Editors:
- Smaranda Muresan, Preslav Nakov, Aline Villavicencio
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1789–1807
- Language:
- URL:
- https://preview.aclanthology.org/icon-24-ingestion/2022.findings-acl.141/
- DOI:
- 10.18653/v1/2022.findings-acl.141
- Cite (ACL):
- Cheonbok Park, Hantae Kim, Ioan Calapodescu, Hyun Chang Cho, and Vassilina Nikoulina. 2022. DaLC: Domain Adaptation Learning Curve Prediction for Neural Machine Translation. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1789–1807, Dublin, Ireland. Association for Computational Linguistics.
- Cite (Informal):
- DaLC: Domain Adaptation Learning Curve Prediction for Neural Machine Translation (Park et al., Findings 2022)
- PDF:
- https://preview.aclanthology.org/icon-24-ingestion/2022.findings-acl.141.pdf