Keep It or Not: Word Level Quality Estimation for Post-Editing

Prasenjit Basu; Santanu Pal; Sudip Kumar Naskar

doi:10.18653/v1/W18-6457

Keep It or Not: Word Level Quality Estimation for Post-Editing

Prasenjit Basu, Santanu Pal, Sudip Kumar Naskar

Abstract

The paper presents our participation in the WMT 2018 shared task on word level quality estimation (QE) of machine translated (MT) text, i.e., to predict whether a word in MT output for a given source context is correctly translated and hence should be retained in the post-edited translation (PE), or not. To perform the QE task, we measure the similarity of the source context of the target MT word with the context for which the word is retained in PE in the training data. This is achieved in two different ways, using Bag-of-Words (BoW) model and Document-to-Vector (Doc2Vec) model. In the BoW model, we compute the cosine similarity while in the Doc2Vec model we consider the Doc2Vec similarity. By applying the Kneedle algorithm on the F1mult vs. similarity score plot, we derive the threshold based on which OK/BAD decisions are taken for the MT words. Experimental results revealed that the Doc2Vec model performs better than the BoW model on the word level QE task.

Anthology ID:: W18-6457
Volume:: Proceedings of the Third Conference on Machine Translation: Shared Task Papers
Month:: October
Year:: 2018
Address:: Belgium, Brussels
Editors:: Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, Karin Verspoor
Venue:: WMT
SIG:: SIGMT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 759–764
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/W18-6457/
DOI:: 10.18653/v1/W18-6457
Bibkey:
Cite (ACL):: Prasenjit Basu, Santanu Pal, and Sudip Kumar Naskar. 2018. Keep It or Not: Word Level Quality Estimation for Post-Editing. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 759–764, Belgium, Brussels. Association for Computational Linguistics.
Cite (Informal):: Keep It or Not: Word Level Quality Estimation for Post-Editing (Basu et al., WMT 2018)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/W18-6457.pdf

PDF Cite Search Fix data