Levenshtein Training for Word-level Quality Estimation
Shuoyang Ding, Marcin Junczys-Dowmunt, Matt Post, Philipp Koehn
Abstract
We propose a novel scheme to use the Levenshtein Transformer to perform the task of word-level quality estimation. A Levenshtein Transformer is a natural fit for this task: trained to perform decoding in an iterative manner, a Levenshtein Transformer can learn to post-edit without explicit supervision. To further minimize the mismatch between the translation task and the word-level QE task, we propose a two-stage transfer learning procedure on both augmented data and human post-editing data. We also propose heuristics to construct reference labels that are compatible with subword-level finetuning and inference. Results on WMT 2020 QE shared task dataset show that our proposed method has superior data efficiency under the data-constrained setting and competitive performance under the unconstrained setting.- Anthology ID:
- 2021.emnlp-main.539
- Volume:
- Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2021
- Address:
- Online and Punta Cana, Dominican Republic
- Editors:
- Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 6724–6733
- Language:
- URL:
- https://aclanthology.org/2021.emnlp-main.539
- DOI:
- 10.18653/v1/2021.emnlp-main.539
- Cite (ACL):
- Shuoyang Ding, Marcin Junczys-Dowmunt, Matt Post, and Philipp Koehn. 2021. Levenshtein Training for Word-level Quality Estimation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6724–6733, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Levenshtein Training for Word-level Quality Estimation (Ding et al., EMNLP 2021)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2021.emnlp-main.539.pdf
- Code
- shuoyangd/stenella