Abstract
Achieving satisfying performance in machine translation on domains for which there is no training data is challenging. Traditional supervised domain adaptation is not suitable for addressing such zero-resource domains because it relies on in-domain parallel data. We show that when in-domain parallel data is not available, access to document-level context enables better capturing of domain generalities compared to only having access to a single sentence. Having access to more information provides a more reliable domain estimation. We present two document-level Transformer models which are capable of using large context sizes and we compare these models against strong Transformer baselines. We obtain improvements for the two zero-resource domains we study. We additionally provide an analysis where we vary the amount of context and look at the case where in-domain data is available.- Anthology ID:
- 2021.adaptnlp-1.9
- Volume:
- Proceedings of the Second Workshop on Domain Adaptation for NLP
- Month:
- April
- Year:
- 2021
- Address:
- Kyiv, Ukraine
- Editors:
- Eyal Ben-David, Shay Cohen, Ryan McDonald, Barbara Plank, Roi Reichart, Guy Rotman, Yftah Ziser
- Venue:
- AdaptNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 80–93
- Language:
- URL:
- https://aclanthology.org/2021.adaptnlp-1.9
- DOI:
- Cite (ACL):
- Dario Stojanovski and Alexander Fraser. 2021. Addressing Zero-Resource Domains Using Document-Level Context in Neural Machine Translation. In Proceedings of the Second Workshop on Domain Adaptation for NLP, pages 80–93, Kyiv, Ukraine. Association for Computational Linguistics.
- Cite (Informal):
- Addressing Zero-Resource Domains Using Document-Level Context in Neural Machine Translation (Stojanovski & Fraser, AdaptNLP 2021)
- PDF:
- https://preview.aclanthology.org/improve-issue-templates/2021.adaptnlp-1.9.pdf