Detecting Cross-Lingual Semantic Divergence for Neural Machine Translation

Marine Carpuat, Yogarshi Vyas, Xing Niu


Abstract
Parallel corpora are often not as parallel as one might assume: non-literal translations and noisy translations abound, even in curated corpora routinely used for training and evaluation. We use a cross-lingual textual entailment system to distinguish sentence pairs that are parallel in meaning from those that are not, and show that filtering out divergent examples from training improves translation quality.
Anthology ID:
W17-3209
Volume:
Proceedings of the First Workshop on Neural Machine Translation
Month:
August
Year:
2017
Address:
Vancouver
Venue:
NGT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
69–79
Language:
URL:
https://aclanthology.org/W17-3209
DOI:
10.18653/v1/W17-3209
Bibkey:
Cite (ACL):
Marine Carpuat, Yogarshi Vyas, and Xing Niu. 2017. Detecting Cross-Lingual Semantic Divergence for Neural Machine Translation. In Proceedings of the First Workshop on Neural Machine Translation, pages 69–79, Vancouver. Association for Computational Linguistics.
Cite (Informal):
Detecting Cross-Lingual Semantic Divergence for Neural Machine Translation (Carpuat et al., NGT 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/W17-3209.pdf
Data
OpenSubtitles