Jian Tong
2026
Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment
Yuming Yang | Mingyoung Lai | Wanxu Zhao | Xiaoran Fan | Zhiheng Xi | Mingqi Wu | Chiyue Huang | Jun Zhao | Haijun Lv | Jian Tong | Yunhua Zhou | Yicheng Zou | Qipeng Guo | Tao Gui | Qi Zhang | Xuanjing Huang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yuming Yang | Mingyoung Lai | Wanxu Zhao | Xiaoran Fan | Zhiheng Xi | Mingqi Wu | Chiyue Huang | Jun Zhao | Haijun Lv | Jian Tong | Yunhua Zhou | Yicheng Zou | Qipeng Guo | Tao Gui | Qi Zhang | Xuanjing Huang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Long chain-of-thought (CoT) trajectories provide rich supervision signals for distilling reasoning from teacher to student LLMs. However, both prior work and our experiments show that trajectories from stronger teachers do not necessarily yield better students, highlighting the importance of data-student suitability in distillation. Existing methods assess suitability primarily through student likelihood, favoring trajectories that align closely with the student model’s current behavior but overlooking more informative ones. Addressing this, we propose Rank–Surprisal Ratio (RSR), a simple metric that captures both alignment and informativeness to assess the suitability of a reasoning trajectory. RSR is motivated by the observation that effective trajectories typically balance learning signal strength and behavioral alignment by combining low absolute probability with relatively high-ranked tokens under the student model.Concretely, RSR is defined as the ratio of a trajectory’s average token-wise rank to its average negative log-likelihood, and is straightforward to compute and interpret. Across five student models and reasoning trajectories from 11 diverse teachers, RSR strongly correlates with post-training reasoning performance (average Spearman 0.86), consistently outperforming existing metrics. We further demonstrate its practical utility in both trajectory selection and teacher selection.
2023
Improving Speech Translation by Fusing Speech and Text
Wenbiao Yin | Zhicheng Liu | Chengqi Zhao | Tao Wang | Jian Tong | Rong Ye
Findings of the Association for Computational Linguistics: EMNLP 2023
Wenbiao Yin | Zhicheng Liu | Chengqi Zhao | Tao Wang | Jian Tong | Rong Ye
Findings of the Association for Computational Linguistics: EMNLP 2023
In speech translation, leveraging multimodal data to improve model performance and address limitations of individual modalities has shown significant effectiveness. In this paper, we harness the complementary strengths of speech and text to improve speech translation. However, speech and text are disparate modalities, we observe three aspects of modality gap that impede their integration in a speech translation model. To tackle these gaps, we propose **Fuse**-**S**peech-**T**ext (**FuseST**), a cross-modal model which supports three distinct input modalities for translation: speech, text and fused speech-text. We leverage multiple techniques for cross-modal alignment and conduct a comprehensive analysis to assess its impact on speech translation, machine translation and fused speech-text translation. We evaluate FuseST on MuST-C, GigaST and newstest benchmark. Experiments show that the proposed FuseST achieves an average 34.0 BLEU on MuST-C En→De/Es/Fr (vs SOTA +1.1 BLEU). Further experiments demonstrate that FuseST does not degrade on MT task, as observed in previous works. Instead, it yields an average improvement of 3.2 BLEU over the pre-trained MT model. Code is available at https://github.com/WenbiaoYin/FuseST.
2021
The Volctrans Neural Speech Translation System for IWSLT 2021
Chengqi Zhao | Zhicheng Liu | Jian Tong | Tao Wang | Mingxuan Wang | Rong Ye | Qianqian Dong | Jun Cao | Lei Li
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)
Chengqi Zhao | Zhicheng Liu | Jian Tong | Tao Wang | Mingxuan Wang | Rong Ye | Qianqian Dong | Jun Cao | Lei Li
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)
This paper describes the systems submitted to IWSLT 2021 by the Volctrans team. We participate in the offline speech translation and text-to-text simultaneous translation tracks. For offline speech translation, our best end-to-end model achieves 7.9 BLEU improvements over the benchmark on the MuST-C test set and is even approaching the results of a strong cascade solution. For text-to-text simultaneous translation, we explore the best practice to optimize the wait-k model. As a result, our final submitted systems exceed the benchmark at around 7 BLEU on the same latency regime. We release our code and model to facilitate both future research works and industrial applications.