2024
pdf
abs
Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation
Xu Huang
|
Zhirui Zhang
|
Xiang Geng
|
Yichao Du
|
Jiajun Chen
|
Shujian Huang
Findings of the Association for Computational Linguistics ACL 2024
This study investigates how Large Language Models (LLMs) leverage source and reference data in machine translation evaluation task, aiming to better understand the mechanisms behind their remarkable performance in this task.We design the controlled experiments across various input modes and model types, and employ both coarse-grained and fine-grained prompts to discern the utility of source versus reference information.We find that reference information significantly enhances the evaluation accuracy, while surprisingly, source information sometimes is counterproductive, indicating LLMs’ inability to fully leverage the cross-lingual capability when evaluating translations.Further analysis of the fine-grained evaluation and fine-tuning experiments show similar results.These findings also suggest a potential research direction for LLMs that fully exploits the cross-lingual capability of LLMs to achieve better performance in machine translation evaluation tasks.
pdf
abs
MAPO: Advancing Multilingual Reasoning through Multilingual-Alignment-as-Preference Optimization
Shuaijie She
|
Wei Zou
|
Shujian Huang
|
Wenhao Zhu
|
Xiang Liu
|
Xiang Geng
|
Jiajun Chen
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Intuitively, reasoning abilities are considered language-agnostic. However, existing LLMs exhibit inconsistent reasoning abilities across different languages, e.g., reasoning in the dominant language like English is superior to other languages due to the imbalance of multilingual training data. To enhance reasoning abilities in non-dominant languages, we propose a Multilingual-Alignment-as-Preference Optimization framework (MAPO) to align the reasoning processes in other languages with the dominant language. Specifically, we harness an off-the-shelf translation model for the consistency between answers in non-dominant and dominant languages, which we adopt as the preference for optimization, e.g., Direct Preference Optimization(DPO) or Proximal Policy Optimization (PPO). Experiments show that MAPO stably achieves significant improvements in the multilingual reasoning of various models on all three benchmarks (MSVAMP +16.2%, MGSM +6.1%, and MNumGLUESub +13.3%), with improved reasoning consistency across languages. The project is available at https://github.com/NJUNLP/MAPO.
2023
pdf
abs
Improved Pseudo Data for Machine Translation Quality Estimation with Constrained Beam Search
Xiang Geng
|
Yu Zhang
|
Zhejian Lai
|
Shuaijie She
|
Wei Zou
|
Shimin Tao
|
Hao Yang
|
Jiajun Chen
|
Shujian Huang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Machine translation (MT) quality estimation (QE) is a crucial task to estimate the quality of MT outputs when reference translations are unavailable. Many studies focus on generating pseudo data using large parallel corpus and achieve remarkable success in the supervised setting. However, pseudo data solutions are less satisfying in unsupervised scenarios because the pseudo labels are inaccurate or the pseudo translations differ from the real ones. To address these problems, we propose to generate pseudo data using the MT model with constrained beam search (CBSQE). CBSQE preserves the reference parts with high MT probabilities as correct translations, while the rest parts as the wrong ones for MT generation. Therefore, CBSQE can reduce the false negative labels caused by synonyms. Overall, beam search will prefer a more real hypothesis with a higher MT generation likelihood. Extensive experiments demonstrate that CBSQE outperforms strong baselines in both supervised and unsupervised settings. Analyses further show the superiority of CBSQE. The code is available at https://github.com/NJUNLP/njuqe.
pdf
abs
Unify Word-level and Span-level Tasks: NJUNLP’s Participation for the WMT2023 Quality Estimation Shared Task
Xiang Geng
|
Zhejian Lai
|
Yu Zhang
|
Shimin Tao
|
Hao Yang
|
Jiajun Chen
|
Shujian Huang
Proceedings of the Eighth Conference on Machine Translation
We introduce the submissions of the NJUNLP team to the WMT 2023 Quality Estimation (QE) shared task. Our team submitted predictions for the English-German language pair on all two sub-tasks: (i) sentence- and word-level quality prediction; and (ii) fine-grained error span detection. This year, we further explore pseudo data methods for QE based on NJUQE framework (https://github.com/NJUNLP/njuqe). We generate pseudo MQM data using parallel data from the WMT translation task. We pre-train the XLMR large model on pseudo QE data, then fine-tune it on real QE data. At both stages, we jointly learn sentence-level scores and word-level tags. Empirically, we conduct experiments to find the key hyper-parameters that improve the performance. Technically, we propose a simple method that covert the word-level outputs to fine-grained error span results. Overall, our models achieved the best results in English-German for both word-level and fine-grained error span detection sub-tasks by a considerable margin.
2022
pdf
abs
NJUNLP’s Participation for the WMT2022 Quality Estimation Shared Task
Xiang Geng
|
Yu Zhang
|
Shujian Huang
|
Shimin Tao
|
Hao Yang
|
Jiajun Chen
Proceedings of the Seventh Conference on Machine Translation (WMT)
This paper presents submissions of the NJUNLP team in WMT 2022Quality Estimation shared task 1, where the goal is to predict the sentence-level and word-level quality for target machine translations. Our system explores pseudo data and multi-task learning. We propose several novel methods to generate pseudo data for different annotations using the conditional masked language model and the neural machine translation model. The proposed methods control the decoding process to generate more real pseudo translations. We pre-train the XLMR-large model with pseudo data and then fine-tune this model with real data both in the way of multi-task learning. We jointly learn sentence-level scores (with regression and rank tasks) and word-level tags (with a sequence tagging task). Our system obtains competitive results on different language pairs and ranks first place on both sentence- and word-level sub-tasks of the English-German language pair.
pdf
abs
CrossQE: HW-TSC 2022 Submission for the Quality Estimation Shared Task
Shimin Tao
|
Su Chang
|
Ma Miaomiao
|
Hao Yang
|
Xiang Geng
|
Shujian Huang
|
Min Zhang
|
Jiaxin Guo
|
Minghan Wang
|
Yinglu Li
Proceedings of the Seventh Conference on Machine Translation (WMT)
Quality estimation (QE) is a crucial method to investigate automatic methods for estimating the quality of machine translation results without reference translations. This paper presents Huawei Translation Services Center’s (HW-TSC’s) work called CrossQE in WMT 2022 QE shared tasks 1 and 2, namely sentence- and word- level quality prediction and explainable QE.CrossQE employes the framework of predictor-estimator for task 1, concretely with a pre-trained cross-lingual XLM-RoBERTa large as predictor and task-specific classifier or regressor as estimator. An extensive set of experimental results show that after adding bottleneck adapter layer, mean teacher loss, masked language modeling task loss and MC dropout methods in CrossQE, the performance has improved to a certain extent. For task 2, CrossQE calculated the cosine similarity between each word feature in the target and each word feature in the source by task 1 sentence-level QE system’s predictor, and used the inverse value of maximum similarity between each word in the target and the source as the word translation error risk value. Moreover, CrossQE has outstanding performance on QE test sets of WMT 2022.
2021
pdf
abs
HW-TSC’s Participation at WMT 2021 Quality Estimation Shared Task
Yimeng Chen
|
Chang Su
|
Yingtao Zhang
|
Yuxia Wang
|
Xiang Geng
|
Hao Yang
|
Shimin Tao
|
Guo Jiaxin
|
Wang Minghan
|
Min Zhang
|
Yujia Liu
|
Shujian Huang
Proceedings of the Sixth Conference on Machine Translation
This paper presents our work in WMT 2021 Quality Estimation (QE) Shared Task. We participated in all of the three sub-tasks, including Sentence-Level Direct Assessment (DA) task, Word and Sentence-Level Post-editing Effort task and Critical Error Detection task, in all language pairs. Our systems employ the framework of Predictor-Estimator, concretely with a pre-trained XLM-Roberta as Predictor and task-specific classifier or regressor as Estimator. For all tasks, we improve our systems by incorporating post-edit sentence or additional high-quality translation sentence in the way of multitask learning or encoding it with predictors directly. Moreover, in zero-shot setting, our data augmentation strategy based on Monte-Carlo Dropout brings up significant improvement on DA sub-task. Notably, our submissions achieve remarkable results over all tasks.
2020
pdf
abs
NJU’s submission to the WMT20 QE Shared Task
Qu Cui
|
Xiang Geng
|
Shujian Huang
|
Jiajun Chen
Proceedings of the Fifth Conference on Machine Translation
This paper describes our system of the sentence-level and word-level Quality Estimation Shared Task of WMT20. Our system is based on the QE Brain, and we simply enhance it by injecting noise at the target side. And to obtain the deep bi-directional information, we use a masked language model at the target side instead of two single directional decoders. Meanwhile, we try to use the extra QE data from the WMT17 and WMT19 to improve our system’s performance. Finally, we ensemble the features or the results from different models to get our best results. Our system finished fifth in the end at sentence-level on both EN-ZH and EN-DE language pairs.