Abstract
We present a study on using YiSi-2 with massive multilingual pretrained language models for machine translation (MT) reference-less evaluation. Aiming at finding better semantic representation for semantic MT evaluation, we first test YiSi-2 with contextual embed- dings extracted from different layers of two different pretrained models, multilingual BERT and XLM-RoBERTa. We also experiment with learning bilingual mappings that trans- form the vector subspace of the source language to be closer to that of the target language in the pretrained model to obtain more accurate cross-lingual semantic similarity representations. Our results show that YiSi-2’s correlation with human direct assessment on translation quality is greatly improved by replacing multilingual BERT with XLM-RoBERTa and projecting the source embeddings into the tar- get embedding space using a cross-lingual lin- ear projection (CLP) matrix learnt from a small development set.- Anthology ID:
- 2020.wmt-1.100
- Volume:
- Proceedings of the Fifth Conference on Machine Translation
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 903–910
- Language:
- URL:
- https://aclanthology.org/2020.wmt-1.100
- DOI:
- Cite (ACL):
- Chi-kiu Lo and Samuel Larkin. 2020. Machine Translation Reference-less Evaluation using YiSi-2 with Bilingual Mappings of Massive Multilingual Language Model. In Proceedings of the Fifth Conference on Machine Translation, pages 903–910, Online. Association for Computational Linguistics.
- Cite (Informal):
- Machine Translation Reference-less Evaluation using YiSi-2 with Bilingual Mappings of Massive Multilingual Language Model (Lo & Larkin, WMT 2020)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2020.wmt-1.100.pdf