Wenbin Liu
2026
Token-Level Policy Optimization: Linking Group-Level Rewards to Token-Level Aggregation via sequence-level likelihood
Xingyu Lin | Yilin Wen | Du Su | En Wang | Wenbin Liu | Zhonghou Lv | Jinchang Hou | Chenfu Bao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xingyu Lin | Yilin Wen | Du Su | En Wang | Wenbin Liu | Zhonghou Lv | Jinchang Hou | Chenfu Bao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Group Relative Policy Optimization (GRPO) has significantly advanced the reasoning ability of large language models (LLMs), particularly in their mathemat- ical reasoning performance. However, GRPO and related entropy regularization methods still struggle with token-level sparse-rewards, which is an inherent challenge in chain-of-thought (CoT) reasoning. These approaches often rely on undifferentiated token-level entropy regu- larization, which easily leads to entropy collapse or model degradation under sparse token rewards. In this work, we propose TEPO, a novel token-level framework that (1) leverages sequence-level likelihood to link group-level rewards with individual tokens via token-level aggregation, and (2) introduces a token-level KL-Divergence mask constraint that targets tokens with positive advantages and decreasing entropy to mitigate abrupt policy updates. Experiments demonstrate that TEPO not only achieves state-of-the-art performance on mathematical reasoning benchmarks but also markedly enhances training stability, reducing convergence time by 50% compared with GRPO/DAPO.
2021
ISTIC’s Triangular Machine Translation System for WMT2021
Hangcheng Guo | Wenbin Liu | Yanqing He | Tian Lan | Hongjiao Xu | Zhenfeng Wu | You Pan
Proceedings of the Sixth Conference on Machine Translation
Hangcheng Guo | Wenbin Liu | Yanqing He | Tian Lan | Hongjiao Xu | Zhenfeng Wu | You Pan
Proceedings of the Sixth Conference on Machine Translation
This paper describes the ISTIC’s submission to the Triangular Machine Translation Task of Russian-to-Chinese machine translation for WMT’ 2021. In order to fully utilize the provided corpora and promote the translation performance from Russian to Chinese, the pivot method is used in our system which pipelines the Russian-to-English translator and the English-to-Chinese translator to form a Russian-to-Chinese translator. Our system is based on the Transformer architecture and several effective strategies are adopted to improve the quality of translation, including corpus filtering, data pre-processing, system combination and model ensemble.
2020
ISTIC’s Neural Machine Translation System for IWSLT’2020
Jiaze Wei | Wenbin Liu | Zhenfeng Wu | You Pan | Yanqing He
Proceedings of the 17th International Conference on Spoken Language Translation
Jiaze Wei | Wenbin Liu | Zhenfeng Wu | You Pan | Yanqing He
Proceedings of the 17th International Conference on Spoken Language Translation
This paper introduces technical details of machine translation system of Institute of Scientific and Technical Information of China (ISTIC) for the 17th International Conference on Spoken Language Translation (IWSLT 2020). ISTIC participated in both translation tasks of the Open Domain Translation track: Japanese-to-Chinese MT task and Chinese-to-Japanese MT task. The paper mainly elaborates on the model framework, data preprocessing methods and decoding strategies adopted in our system. In addition, the system performance on the development set are given under different settings.