2025
pdf
bib
abs
TRANS-ZERO: Self-Play Incentivizes Large Language Models for Multilingual Translation Without Parallel Data
Wei Zou
|
Sen Yang
|
Yu Bao
|
Shujian Huang
|
Jiajun Chen
|
Shanbo Cheng
Findings of the Association for Computational Linguistics: ACL 2025
The rise of Large Language Models (LLMs) has reshaped machine translation (MT), but multilingual MT still relies heavily on parallel data for supervised fine-tuning (SFT), facing challenges like data scarcity for low-resource languages and catastrophic forgetting. To address these issues, we propose TRANS-ZERO, a self-play framework that leverages only monolingual data and the intrinsic multilingual knowledge of LLM. TRANS-ZERO combines Genetic Monte-Carlo Tree Search (G-MCTS) with preference optimization, achieving strong translation performance that rivals supervised methods. Experiments demonstrate that this approach not only matches the performance of models trained on large-scale parallel data but also excels in non-English translation directions. Further analysis reveals that G-MCTS itself significantly enhances translation quality by exploring semantically consistent candidates through iterative translations, providing a robust foundation for the framework’s success.
2024
pdf
bib
abs
MAPO: Advancing Multilingual Reasoning through Multilingual-Alignment-as-Preference Optimization
Shuaijie She
|
Wei Zou
|
Shujian Huang
|
Wenhao Zhu
|
Xiang Liu
|
Xiang Geng
|
Jiajun Chen
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Intuitively, reasoning abilities are considered language-agnostic. However, existing LLMs exhibit inconsistent reasoning abilities across different languages, e.g., reasoning in the dominant language like English is superior to other languages due to the imbalance of multilingual training data. To enhance reasoning abilities in non-dominant languages, we propose a Multilingual-Alignment-as-Preference Optimization framework (MAPO) to align the reasoning processes in other languages with the dominant language. Specifically, we harness an off-the-shelf translation model for the consistency between answers in non-dominant and dominant languages, which we adopt as the preference for optimization, e.g., Direct Preference Optimization(DPO) or Proximal Policy Optimization (PPO). Experiments show that MAPO stably achieves significant improvements in the multilingual reasoning of various models on all three benchmarks (MSVAMP +16.2%, MGSM +6.1%, and MNumGLUESub +13.3%), with improved reasoning consistency across languages. The project is available at https://github.com/NJUNLP/MAPO.
2023
pdf
bib
abs
Local Interpretation of Transformer Based on Linear Decomposition
Sen Yang
|
Shujian Huang
|
Wei Zou
|
Jianbing Zhang
|
Xinyu Dai
|
Jiajun Chen
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In recent years, deep neural networks (DNNs) have achieved state-of-the-art performance on a wide range of tasks. However, limitations in interpretability have hindered their applications in the real world. This work proposes to interpret neural networks by linear decomposition and finds that the ReLU-activated Transformer can be considered as a linear model on a single input. We further leverage the linearity of the model and propose a linear decomposition of the model output to generate local explanations. Our evaluation of sentiment classification and machine translation shows that our method achieves competitive performance in efficiency and fidelity of explanation. In addition, we demonstrate the potential of our approach in applications with examples of error analysis on multiple tasks.
pdf
bib
abs
Improved Pseudo Data for Machine Translation Quality Estimation with Constrained Beam Search
Xiang Geng
|
Yu Zhang
|
Zhejian Lai
|
Shuaijie She
|
Wei Zou
|
Shimin Tao
|
Hao Yang
|
Jiajun Chen
|
Shujian Huang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Machine translation (MT) quality estimation (QE) is a crucial task to estimate the quality of MT outputs when reference translations are unavailable. Many studies focus on generating pseudo data using large parallel corpus and achieve remarkable success in the supervised setting. However, pseudo data solutions are less satisfying in unsupervised scenarios because the pseudo labels are inaccurate or the pseudo translations differ from the real ones. To address these problems, we propose to generate pseudo data using the MT model with constrained beam search (CBSQE). CBSQE preserves the reference parts with high MT probabilities as correct translations, while the rest parts as the wrong ones for MT generation. Therefore, CBSQE can reduce the false negative labels caused by synonyms. Overall, beam search will prefer a more real hypothesis with a higher MT generation likelihood. Extensive experiments demonstrate that CBSQE outperforms strong baselines in both supervised and unsupervised settings. Analyses further show the superiority of CBSQE. The code is available at https://github.com/NJUNLP/njuqe.
2020
pdf
bib
abs
A Reinforced Generation of Adversarial Examples for Neural Machine Translation
Wei Zou
|
Shujian Huang
|
Jun Xie
|
Xinyu Dai
|
Jiajun Chen
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Neural machine translation systems tend to fail on less decent inputs despite its significant efficacy, which may significantly harm the credibility of these systems—fathoming how and when neural-based systems fail in such cases is critical for industrial maintenance. Instead of collecting and analyzing bad cases using limited handcrafted error features, here we investigate this issue by generating adversarial examples via a new paradigm based on reinforcement learning. Our paradigm could expose pitfalls for a given performance metric, e.g., BLEU, and could target any given neural machine translation architecture. We conduct experiments of adversarial attacks on two mainstream neural machine translation architectures, RNN-search, and Transformer. The results show that our method efficiently produces stable attacks with meaning-preserving adversarial examples. We also present a qualitative and quantitative analysis for the preference pattern of the attack, demonstrating its capability of pitfall exposure.