Machine Translation Reference-less Evaluation using YiSi-2 with Bilingual Mappings of Massive Multilingual Language Model

Chi-kiu Lo, Samuel Larkin


Abstract
We present a study on using YiSi-2 with massive multilingual pretrained language models for machine translation (MT) reference-less evaluation. Aiming at finding better semantic representation for semantic MT evaluation, we first test YiSi-2 with contextual embed- dings extracted from different layers of two different pretrained models, multilingual BERT and XLM-RoBERTa. We also experiment with learning bilingual mappings that trans- form the vector subspace of the source language to be closer to that of the target language in the pretrained model to obtain more accurate cross-lingual semantic similarity representations. Our results show that YiSi-2’s correlation with human direct assessment on translation quality is greatly improved by replacing multilingual BERT with XLM-RoBERTa and projecting the source embeddings into the tar- get embedding space using a cross-lingual lin- ear projection (CLP) matrix learnt from a small development set.
Anthology ID:
2020.wmt-1.100
Volume:
Proceedings of the Fifth Conference on Machine Translation
Month:
November
Year:
2020
Address:
Online
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
903–910
Language:
URL:
https://aclanthology.org/2020.wmt-1.100
DOI:
Bibkey:
Cite (ACL):
Chi-kiu Lo and Samuel Larkin. 2020. Machine Translation Reference-less Evaluation using YiSi-2 with Bilingual Mappings of Massive Multilingual Language Model. In Proceedings of the Fifth Conference on Machine Translation, pages 903–910, Online. Association for Computational Linguistics.
Cite (Informal):
Machine Translation Reference-less Evaluation using YiSi-2 with Bilingual Mappings of Massive Multilingual Language Model (Lo & Larkin, WMT 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.wmt-1.100.pdf
Video:
 https://slideslive.com/38939653