Improving Machine Translation of Rare and Unseen Word Senses
Viktor Hangya, Qianchu Liu, Dario Stojanovski, Alexander Fraser, Anna Korhonen
Abstract
The performance of NMT systems has improved drastically in the past few years but the translation of multi-sense words still poses a challenge. Since word senses are not represented uniformly in the parallel corpora used for training, there is an excessive use of the most frequent sense in MT output. In this work, we propose CmBT (Contextually-mined Back-Translation), an approach for improving multi-sense word translation leveraging pre-trained cross-lingual contextual word representations (CCWRs). Because of their contextual sensitivity and their large pre-training data, CCWRs can easily capture word senses that are missing or very rare in parallel corpora used to train MT. Specifically, CmBT applies bilingual lexicon induction on CCWRs to mine sense-specific target sentences from a monolingual dataset, and then back-translates these sentences to generate a pseudo parallel corpus as additional training data for an MT system. We test the translation quality of ambiguous words on the MuCoW test suite, which was built to test the word sense disambiguation effectiveness of MT systems. We show that our system improves on the translation of difficult unseen and low frequency word senses.- Anthology ID:
- 2021.wmt-1.66
- Volume:
- Proceedings of the Sixth Conference on Machine Translation
- Month:
- November
- Year:
- 2021
- Address:
- Online
- Editors:
- Loic Barrault, Ondrej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussa, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Tom Kocmi, Andre Martins, Makoto Morishita, Christof Monz
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 614–624
- Language:
- URL:
- https://aclanthology.org/2021.wmt-1.66
- DOI:
- Cite (ACL):
- Viktor Hangya, Qianchu Liu, Dario Stojanovski, Alexander Fraser, and Anna Korhonen. 2021. Improving Machine Translation of Rare and Unseen Word Senses. In Proceedings of the Sixth Conference on Machine Translation, pages 614–624, Online. Association for Computational Linguistics.
- Cite (Informal):
- Improving Machine Translation of Rare and Unseen Word Senses (Hangya et al., WMT 2021)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2021.wmt-1.66.pdf