Jiang Zhou
2026
SARA: Unlocking Multilingual Knowledge in Mixture-of-Experts via Semantically Anchored Routing Alignment
Tianyu Dong | Yangyang Liu | Jiang Zhou | Xinwei Wu | Xiaohu Zhao | Hao Wang | Heng Liu | Linlong Xu | Longyue Wang | Weihua Luo | Shaolin Zhu | Deyi Xiong
Findings of the Association for Computational Linguistics: ACL 2026
Tianyu Dong | Yangyang Liu | Jiang Zhou | Xinwei Wu | Xiaohu Zhao | Hao Wang | Heng Liu | Linlong Xu | Longyue Wang | Weihua Luo | Shaolin Zhu | Deyi Xiong
Findings of the Association for Computational Linguistics: ACL 2026
Sparse Mixture-of-Experts (MoE) architectures have emerged as an increasingly influential paradigm as they offer a strategic balance between parameter scalability and computational efficiency. However, low-resource language tokens are often routed to different experts than those predominantly activated by high-resource inputs, which limits cross-lingual expert sharing. This cross-lingual routing divergence consequently hinders their efficacy in multilingual contexts. To address this issue, we propose SARA (Semantically Anchored Routing Alignment), a framework designed to transfer specialized capabilities from high-resource languages as anchors to low-resource languages. SARA explicitly aligns the routing distribution of multilingual inputs with high-resource semantic anchors using a symmetric Jensen-Shannon (JS) divergence constraint. Unlike traditional distillation methods that operate on output logits, SARA directly aligns the internal routing distributions of MoE layers, encouraging mechanistic consistency in expert selection across languages. We conduct experiments on 2 LLMs across 5 low-resource languages and 3 benchmarks. Experiment results demonstrate that SARA outperforms standard instruction tuning (e.g., +0.8% on Qwen3-30B-A3B and +1.2% on Phi-3.5-MoE-instruct on Global-MMLU benchmark). Further analyses show that SARA effectively addresses performance bottlenecks in low-resource languages, providing a scalable pathway to enhance multilingual capabilities in sparse architectures.
Incentivizing Parametric Knowledge via Reinforcement Learning with Verifiable Rewards for Cross-Cultural Entity Translation
Jiang Zhou | Xiaohu Zhao | Xinwei Wu | Tianyu Dong | Hao Wang | Yangyang Liu | Heng Liu | Linlong Xu | Longyue Wang | Weihua Luo | Deyi Xiong
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jiang Zhou | Xiaohu Zhao | Xinwei Wu | Tianyu Dong | Hao Wang | Yangyang Liu | Heng Liu | Linlong Xu | Longyue Wang | Weihua Luo | Deyi Xiong
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Cross-cultural entity translation remains challenging for large language models (LLMs) as literal or phonetic renderings are usually yielded instead of culturally appropriate translations in context. However, relevant knowledge may already be encoded in model parameters during large-scale pre-training. To incentivize the effective use of parametric knowledge, we propose EA-RLVR (Entity-Anchored Reinforcement Learning with Verifiable Rewards), a training framework that optimizes cross-cultural entity translation without relying on external knowledge bases. EA-RLVR anchors supervision on a verifiable, entity-level reward signal and incorporates lightweight structural gates to stabilize optimization. This design steers the model toward learning a robust reasoning process rather than merely imitating reference translations. We evaluate EA-RLVR on XC-Translate and observe consistent improvements in both entity translation accuracy and out-of-domain generalization. Specifically, training on merely 7k samples boosts Qwen3-14B’s entity translation accuracy from 23.66% to 31.87% on a 50k test set comprising entirely unseen entities. The learned entity translation ability also transfers to general translation, yielding +1.35 XCOMET on WMT24pp, which scales to +1.59 with extended optimization. Extensive analyses of pass@k dynamics and reward formulations attribute these gains to superior sampling efficiency and a stable optimization landscape.
From Curated Data to Scalable Models: Continual Pre-training of Dense and MoE Large Language Models for Tibetan
Lei Yang | Leiyu Pan | Bojian Xiong | Renren Jin | Shaowei Zhang | Yue Chen | Ling Shi | Jiang Zhou | Junru Wu | Zhen Wang | Jianxiang Peng | Juesi Xiao | Tianyu Dong | Zhuowen Han | Zhuo Chen | Yuqi Ren | Deyi Xiong
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Lei Yang | Leiyu Pan | Bojian Xiong | Renren Jin | Shaowei Zhang | Yue Chen | Ling Shi | Jiang Zhou | Junru Wu | Zhen Wang | Jianxiang Peng | Juesi Xiao | Tianyu Dong | Zhuowen Han | Zhuo Chen | Yuqi Ren | Deyi Xiong
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) have achieved remarkable success across a wide range of natural language processing tasks, yet their performance remains heavily biased toward high-resource languages. Tibetan, despite its cultural significance and large speaker population, is still substantially underrepresented. In this work, we present a comprehensive pipeline for advancing Tibetan language modeling through large-scale data curation and continual pre-training. We construct a 72 GB high-quality Tibetan corpus, the largest to date, and adapt Qwen2.5-7B through balanced multilingual continual pre-training with Tibetan, Chinese, and English, followed by multilingual instruction tuning. To further scale capacity efficiently, we extend the dense model to a 50B-A10B Mixture-of-Experts architecture. Due to the absence of standardized Tibetan benchmarks, we build multiple evaluation datasets via high-quality translation and human verification. Experimental results show that both dense and MoE models consistently outperform existing open-source and Tibetan-focused models of similar scale across diverse tasks. Our work advances Tibetan-centric LLM research and provides transferable insights for extending LLMs to other low-resource languages. We will release the model weights, evaluation benchmarks, and detailed data processing documentation in the follow-up.
M2PO: Multi-Perspective Multi-Pair Preference Optimization for Machine Translation
Hao Wang | Linlong Xu | Heng Liu | Yangyang Liu | Xiaohu Zhao | Bo Zeng | Liangying Shao | Yichen Dong | Xinwei Wu | Jiang Zhou | Tianyu Dong | Xiangxiang Zeng | Longyue Wang | Weihua Luo
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Hao Wang | Linlong Xu | Heng Liu | Yangyang Liu | Xiaohu Zhao | Bo Zeng | Liangying Shao | Yichen Dong | Xinwei Wu | Jiang Zhou | Tianyu Dong | Xiangxiang Zeng | Longyue Wang | Weihua Luo
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Aligning Large Language Models (LLMs) to human preferences is pivotal for Machine Translation (MT), yet current approaches are often hindered by misleading reward signals. Our analysis reveals that prevailing Quality Estimation (QE) models exhibit a systematic blind spot towards **partial errors**—specifically partial hallucinations and omissions—often favoring superficially fluent but unfaithful translations. To address this, we propose **M2PO** (**M**ulti-Perspective **M**ulti-Pair **P**reference **O**ptimization), a data-centric framework for preference optimization in machine translation. First, to correct the bias towards fluency, M2PO uses a multi-perspective alignment mechanism that decouples semantic fidelity from fluency, prioritizing faithfulness via a curriculum strategy. Second, with the bias corrected, partial errors fall between perfect and severely incorrect translations, making them inefficient to learn via standard best-versus-worst comparisons. We thus introduce a multi-pair objective that leverages the full candidate list to capture these fine-grained error signals. Experiments on WMT23, WMT24, and FLORES-200 show that M2PO enables a 9B model to outperform leading open-source baselines and achieve parity with proprietary models like GPT-4o and Gemini-2.0-Flash, demonstrating significant potential for efficient, high-fidelity LLM-based translation.
Search
Fix author
Co-authors
- Tianyu Dong 4
- Yangyang Liu 3
- Heng Liu 3
- Weihua Luo 3
- Hao Wang 3
- Longyue Wang 3
- Xinwei Wu 3
- Deyi Xiong (德意 熊) 3
- Linlong Xu 3
- Xiaohu Zhao 3
- Yue Chen 1
- Zhuo Chen 1
- Yichen Dong 1
- Zhuowen Han 1
- Renren Jin 1
- Leiyu Pan 1
- Jianxiang Peng 1
- Yuqi Ren 1
- Liangying Shao 1
- Ling Shi 1
- Zhen Wang 1
- Junru Wu 1
- Juesi Xiao 1
- Bojian Xiong 1
- Lei Yang 1
- Bo Zeng 1
- Xiangxiang Zeng 1
- Shaowei Zhang 1
- Shaolin Zhu 1