Improved Pseudo Data for Machine Translation Quality Estimation with Constrained Beam Search
Xiang Geng, Yu Zhang, Zhejian Lai, Shuaijie She, Wei Zou, Shimin Tao, Hao Yang, Jiajun Chen, Shujian Huang
Abstract
Machine translation (MT) quality estimation (QE) is a crucial task to estimate the quality of MT outputs when reference translations are unavailable. Many studies focus on generating pseudo data using large parallel corpus and achieve remarkable success in the supervised setting. However, pseudo data solutions are less satisfying in unsupervised scenarios because the pseudo labels are inaccurate or the pseudo translations differ from the real ones. To address these problems, we propose to generate pseudo data using the MT model with constrained beam search (CBSQE). CBSQE preserves the reference parts with high MT probabilities as correct translations, while the rest parts as the wrong ones for MT generation. Therefore, CBSQE can reduce the false negative labels caused by synonyms. Overall, beam search will prefer a more real hypothesis with a higher MT generation likelihood. Extensive experiments demonstrate that CBSQE outperforms strong baselines in both supervised and unsupervised settings. Analyses further show the superiority of CBSQE. The code is available at https://github.com/NJUNLP/njuqe.- Anthology ID:
- 2023.emnlp-main.764
- Volume:
- Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 12434–12447
- Language:
- URL:
- https://aclanthology.org/2023.emnlp-main.764
- DOI:
- 10.18653/v1/2023.emnlp-main.764
- Cite (ACL):
- Xiang Geng, Yu Zhang, Zhejian Lai, Shuaijie She, Wei Zou, Shimin Tao, Hao Yang, Jiajun Chen, and Shujian Huang. 2023. Improved Pseudo Data for Machine Translation Quality Estimation with Constrained Beam Search. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12434–12447, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Improved Pseudo Data for Machine Translation Quality Estimation with Constrained Beam Search (Geng et al., EMNLP 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2023.emnlp-main.764.pdf