Improving Machine Translation of Rare and Unseen Word Senses

Viktor Hangya, Qianchu Liu, Dario Stojanovski, Alexander Fraser, Anna Korhonen


Abstract
The performance of NMT systems has improved drastically in the past few years but the translation of multi-sense words still poses a challenge. Since word senses are not represented uniformly in the parallel corpora used for training, there is an excessive use of the most frequent sense in MT output. In this work, we propose CmBT (Contextually-mined Back-Translation), an approach for improving multi-sense word translation leveraging pre-trained cross-lingual contextual word representations (CCWRs). Because of their contextual sensitivity and their large pre-training data, CCWRs can easily capture word senses that are missing or very rare in parallel corpora used to train MT. Specifically, CmBT applies bilingual lexicon induction on CCWRs to mine sense-specific target sentences from a monolingual dataset, and then back-translates these sentences to generate a pseudo parallel corpus as additional training data for an MT system. We test the translation quality of ambiguous words on the MuCoW test suite, which was built to test the word sense disambiguation effectiveness of MT systems. We show that our system improves on the translation of difficult unseen and low frequency word senses.
Anthology ID:
2021.wmt-1.66
Volume:
Proceedings of the Sixth Conference on Machine Translation
Month:
November
Year:
2021
Address:
Online
Editors:
Loic Barrault, Ondrej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussa, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Tom Kocmi, Andre Martins, Makoto Morishita, Christof Monz
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
614–624
Language:
URL:
https://aclanthology.org/2021.wmt-1.66
DOI:
Bibkey:
Cite (ACL):
Viktor Hangya, Qianchu Liu, Dario Stojanovski, Alexander Fraser, and Anna Korhonen. 2021. Improving Machine Translation of Rare and Unseen Word Senses. In Proceedings of the Sixth Conference on Machine Translation, pages 614–624, Online. Association for Computational Linguistics.
Cite (Informal):
Improving Machine Translation of Rare and Unseen Word Senses (Hangya et al., WMT 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/2021.wmt-1.66.pdf
Video:
 https://preview.aclanthology.org/ingest-2024-clasp/2021.wmt-1.66.mp4