Priyesh Jain


2022

pdf
Quality Scoring of Source Words in Neural Translation Models
Priyesh Jain | Sunita Sarawagi | Tushar Tomar
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Word-level quality scores on input source sentences can provide useful feedback to an end-user when translating into an unfamiliar target language. Recent approaches either require training special word-scoring models based on synthetic data or require repeated invocation of the translation model. We propose a simple approach based on comparing the difference of probabilities from two language models. The basic premise of our method is to reason how well each source word is explained by the target sentence as against the source language model. Our approach provides up to five points higher F1 scores and is significantly faster than the state of the art methods on three language pairs. Also, our method does not require training any new model. We release a public dataset on word omissions and mistranslations on a new language pair.