Abstract
Word-level quality scores on input source sentences can provide useful feedback to an end-user when translating into an unfamiliar target language. Recent approaches either require training special word-scoring models based on synthetic data or require repeated invocation of the translation model. We propose a simple approach based on comparing the difference of probabilities from two language models. The basic premise of our method is to reason how well each source word is explained by the target sentence as against the source language model. Our approach provides up to five points higher F1 scores and is significantly faster than the state of the art methods on three language pairs. Also, our method does not require training any new model. We release a public dataset on word omissions and mistranslations on a new language pair.- Anthology ID:
- 2022.emnlp-main.732
- Volume:
- Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Editors:
- Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 10683–10691
- Language:
- URL:
- https://aclanthology.org/2022.emnlp-main.732
- DOI:
- 10.18653/v1/2022.emnlp-main.732
- Cite (ACL):
- Priyesh Jain, Sunita Sarawagi, and Tushar Tomar. 2022. Quality Scoring of Source Words in Neural Translation Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10683–10691, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- Quality Scoring of Source Words in Neural Translation Models (Jain et al., EMNLP 2022)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2022.emnlp-main.732.pdf