Quality Estimation and Translation Metrics via Pre-trained Word and Sentence Embeddings

Elizaveta Yankovskaya; Andre Tättar; Mark Fishel

doi:10.18653/v1/W19-5410

Quality Estimation and Translation Metrics via Pre-trained Word and Sentence Embeddings

Elizaveta Yankovskaya, Andre Tättar, Mark Fishel

Abstract

We propose the use of pre-trained embeddings as features of a regression model for sentence-level quality estimation of machine translation. In our work we combine freely available BERT and LASER multilingual embeddings to train a neural-based regression model. In the second proposed method we use as an input features not only pre-trained embeddings, but also log probability of any machine translation (MT) system. Both methods are applied to several language pairs and are evaluated both as a classical quality estimation system (predicting the HTER score) as well as an MT metric (predicting human judgements of translation quality).

Anthology ID:: W19-5410
Volume:: Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
Month:: August
Year:: 2019
Address:: Florence, Italy
Venues:: ACL | WMT | WS
SIG:: SIGMT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 101–105
Language:
URL:: https://aclanthology.org/W19-5410
DOI:: 10.18653/v1/W19-5410
Bibkey:
Cite (ACL):: Elizaveta Yankovskaya, Andre Tättar, and Mark Fishel. 2019. Quality Estimation and Translation Metrics via Pre-trained Word and Sentence Embeddings. In Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2), pages 101–105, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):: Quality Estimation and Translation Metrics via Pre-trained Word and Sentence Embeddings (Yankovskaya et al., 2019)
Copy Citation:
PDF:: https://preview.aclanthology.org/update-css-js/W19-5410.pdf

PDF Cite Search