Ziqiang Xu
2026
NiuTrans.LMT: Toward Inclusive and Scalable Multilingual Machine Translation with LLMs
Yingfeng Luo | Ziqiang Xu | Yuxuan Ouyang | MuRun Yang | DingYang Lin | Kaiyan Chang | Tong Zheng | Bei Li | Peinan Feng | Quan Du | Tong Xiao | JingBo Zhu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yingfeng Luo | Ziqiang Xu | Yuxuan Ouyang | MuRun Yang | DingYang Lin | Kaiyan Chang | Tong Zheng | Bei Li | Peinan Feng | Quan Du | Tong Xiao | JingBo Zhu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models have significantly advanced Multilingual Machine Translation (MMT), yet scaling to many languages while keeping quality robust across directions remains challenging.In this paper, we identify a failure mode of multilingual supervised fine-tuning (SFT) on multi-way parallel data: when such data are reused symmetrically around a pivot language (e.g., English), performance on reverse directions (X → pivot) can drop substantially.We term this phenomenon Directional Degeneration and attribute it to excessive many-to-one mappings, which encourage shortcut learning.We propose Strategic Downsampling (SD), a simple yet effective method to mitigate this degeneration.In addition, we introduce Parallel Multilingual Prompting (PMP), which augments translation instructions with an auxiliary parallel sentence to promote cross-lingual transfer during training and enables optional test-time enhancement when auxiliary translations are available. We further develop NiuTrans.LMT (Large-scale Multilingual Translation, abbreviated as LMT), a Chinese–English-centric suite of multilingual translation models spanning four sizes (0.6B/1.7B/4B/8B) and covering 60 languages and 234 directions.Comprehensive evaluations show that LMT is competitive among open-source MMT systems, and that our 4B LMT model performs on par with or better than substantially larger baselines. We release our models and project resources to support inclusive and scalable MMT.
2025
Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation
Yingfeng Luo | Tong Zheng | Yongyu Mu | Bei Li | Qinghong Zhang | Yongqi Gao | Ziqiang Xu | Peinan Feng | Xiaoqian Liu | Tong Xiao | JingBo Zhu
Findings of the Association for Computational Linguistics: ACL 2025
Yingfeng Luo | Tong Zheng | Yongyu Mu | Bei Li | Qinghong Zhang | Yongqi Gao | Ziqiang Xu | Peinan Feng | Xiaoqian Liu | Tong Xiao | JingBo Zhu
Findings of the Association for Computational Linguistics: ACL 2025
The field of neural machine translation (NMT) has changed with the advent of large language models (LLMs). Much of the recent emphasis in natural language processing (NLP) has been on modeling machine translation and many other problems using a single pre-trained Transformer decoder, while encoder-decoder architectures, which were the standard in earlier NMT models, have received relatively less attention. In this paper, we explore translation models that are universal, efficient, and easy to optimize, by marrying the world of LLMs with the world of NMT. We apply LLMs to NMT encoding and leave the NMT decoder unchanged. We also develop methods for adapting LLMs to work better with the NMT decoder. Furthermore, we construct a new dataset involving multiple tasks to assess how well the machine translation system generalizes across various tasks. Evaluations on the WMT and our datasets show that results using our method match or surpass a range of baselines in terms of translation quality, but achieve 2.4 ∼ 6.5 × inference speedups and a 75% reduction in the memory footprint of the KV cache. It also demonstrates strong generalization across a variety of translation-related tasks.