Jian Tong
2026
Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment
Yuming Yang | Mingyoung Lai | Wanxu Zhao | Xiaoran Fan | Zhiheng Xi | Mingqi Wu | Chiyue Huang | Jun Zhao | Haijun Lv | Jian Tong | Yunhua Zhou | Yicheng Zou | Qipeng Guo | Tao Gui | Qi Zhang | Xuanjing Huang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yuming Yang | Mingyoung Lai | Wanxu Zhao | Xiaoran Fan | Zhiheng Xi | Mingqi Wu | Chiyue Huang | Jun Zhao | Haijun Lv | Jian Tong | Yunhua Zhou | Yicheng Zou | Qipeng Guo | Tao Gui | Qi Zhang | Xuanjing Huang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Long chain-of-thought (CoT) trajectories provide rich supervision signals for distilling reasoning from teacher to student LLMs. However, both prior work and our experiments show that trajectories from stronger teachers do not necessarily yield better students, highlighting the importance of data-student suitability in distillation. Existing methods assess suitability primarily through student likelihood, favoring trajectories that align closely with the student model’s current behavior but overlooking more informative ones. Addressing this, we propose Rank–Surprisal Ratio (RSR), a simple metric that captures both alignment and informativeness to assess the suitability of a reasoning trajectory. RSR is motivated by the observation that effective trajectories typically balance learning signal strength and behavioral alignment by combining low absolute probability with relatively high-ranked tokens under the student model.Concretely, RSR is defined as the ratio of a trajectory’s average token-wise rank to its average negative log-likelihood, and is straightforward to compute and interpret. Across five student models and reasoning trajectories from 11 diverse teachers, RSR strongly correlates with post-training reasoning performance (average Spearman 0.86), consistently outperforming existing metrics. We further demonstrate its practical utility in both trajectory selection and teacher selection.
Rethinking Multiple-Choice Questions for RLVR: Unlocking Potential via Distractor Design
Xu Guo | Qiming Ge | Jian Tong | Kedi Chen | Jin Zhang | Xiaogui Yang | Xuan Gao | Haijun Lv | Zhihui Lu | Yicheng Zou | Qipeng Guo
Findings of the Association for Computational Linguistics: ACL 2026
Xu Guo | Qiming Ge | Jian Tong | Kedi Chen | Jin Zhang | Xiaogui Yang | Xuan Gao | Haijun Lv | Zhihui Lu | Yicheng Zou | Qipeng Guo
Findings of the Association for Computational Linguistics: ACL 2026
Reinforcement Learning with Verifiable Rewards (RLVR) significantly enhances the reasoning capabilities of Large Language Models. When applied to RLVR, Multiple-Choice Questions (MCQs) offer a scalable source of verifiable data but risk inducing reward hacking, where models shortcut reasoning via random guessing or simple elimination. Current approaches often mitigate this by converting MCQs to open-ended formats, thereby discarding the contrastive signal provided by expert-designed distractors. In this work, we systematically investigate the impact of option design on RLVR. Our analysis highlights two primary insights: (1) Mismatches in option counts between training and testing degrade performance. (2) Strong distractors effectively mitigate random guessing, enabling effective RLVR training even with 2-way questions. Motivated by these findings, we propose Iterative Distractor Curation (IDC), a framework that actively constructs high-quality distractors to block elimination shortcuts and promote deep reasoning. Experiments on various benchmarks demonstrate that our method effectively enhances distractor quality and yields significant gains in RLVR training compared to the original data.
2023
Improving Speech Translation by Fusing Speech and Text
Wenbiao Yin | Zhicheng Liu | Chengqi Zhao | Tao Wang | Jian Tong | Rong Ye
Findings of the Association for Computational Linguistics: EMNLP 2023
Wenbiao Yin | Zhicheng Liu | Chengqi Zhao | Tao Wang | Jian Tong | Rong Ye
Findings of the Association for Computational Linguistics: EMNLP 2023
In speech translation, leveraging multimodal data to improve model performance and address limitations of individual modalities has shown significant effectiveness. In this paper, we harness the complementary strengths of speech and text to improve speech translation. However, speech and text are disparate modalities, we observe three aspects of modality gap that impede their integration in a speech translation model. To tackle these gaps, we propose **Fuse**-**S**peech-**T**ext (**FuseST**), a cross-modal model which supports three distinct input modalities for translation: speech, text and fused speech-text. We leverage multiple techniques for cross-modal alignment and conduct a comprehensive analysis to assess its impact on speech translation, machine translation and fused speech-text translation. We evaluate FuseST on MuST-C, GigaST and newstest benchmark. Experiments show that the proposed FuseST achieves an average 34.0 BLEU on MuST-C En→De/Es/Fr (vs SOTA +1.1 BLEU). Further experiments demonstrate that FuseST does not degrade on MT task, as observed in previous works. Instead, it yields an average improvement of 3.2 BLEU over the pre-trained MT model. Code is available at https://github.com/WenbiaoYin/FuseST.
2021
The Volctrans Neural Speech Translation System for IWSLT 2021
Chengqi Zhao | Zhicheng Liu | Jian Tong | Tao Wang | Mingxuan Wang | Rong Ye | Qianqian Dong | Jun Cao | Lei Li
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)
Chengqi Zhao | Zhicheng Liu | Jian Tong | Tao Wang | Mingxuan Wang | Rong Ye | Qianqian Dong | Jun Cao | Lei Li
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)
This paper describes the systems submitted to IWSLT 2021 by the Volctrans team. We participate in the offline speech translation and text-to-text simultaneous translation tracks. For offline speech translation, our best end-to-end model achieves 7.9 BLEU improvements over the benchmark on the MuST-C test set and is even approaching the results of a strong cascade solution. For text-to-text simultaneous translation, we explore the best practice to optimize the wait-k model. As a result, our final submitted systems exceed the benchmark at around 7 BLEU on the same latency regime. We release our code and model to facilitate both future research works and industrial applications.
Search
Fix author
Co-authors
- Qipeng Guo 2
- Zhicheng Liu 2
- Haijun Lv 2
- Tao Wang 2
- Rong Ye 2
- Chengqi Zhao 2
- Yicheng Zou 2
- Jun Cao 1
- Kedi Chen 1
- Qianqian Dong 1
- Xiaoran Fan 1
- Xuan Gao 1
- Qiming Ge 1
- Tao Gui 1
- Xu Guo 1
- Chiyue Huang 1
- Xuan-Jing Huang (黄萱菁) 1
- Mingyoung Lai 1
- Lei Li 1
- Zhihui Lu 1
- Mingxuan Wang 1
- Mingqi Wu 1
- Zhiheng Xi 1
- Yuming Yang 1
- Xiaogui Yang 1
- Wenbiao Yin 1
- Qi Zhang 1
- Jin Zhang 1
- Wanxu Zhao 1
- Jun Zhao 1
- Yunhua Zhou 1