MuTIS: Enhancing Reasoning Efficiency through Multi Turn Intervention Sampling in Reinforcement Learning
Wenshuo Zhao, Haoxing Zhai, Xinyu Qiu, Zhenting Qi, Shuhe Li, Linchao Zhu
Abstract
Recently, large reasoning models (LRMs) have demonstrated state-of-the-art performance across a wide range of benchmarks. However, a common challenge for these models is the “overthinking” problem, which leads to excessive reasoning steps and significant computational overhead. Furthermore, the issues with long Chain-of-Thought (CoT) are especially pronounced in smaller models (≤ 3B parameters). Aside from producing excessively verbose “reflection words”, they often exhibit repetition and get trapped in unproductive generation loops. Existing solutions typically involve either using flexible reasoning chains as training data or leveraging the model’s latent space to bypass intermediate reasoning steps, but none of these methods have considered directly optimizing reasoning trajectories during the sampling phase of training. In our work, we introduce the Multi-Turn Intervention Sampling Framework (MuTIS). Our framework leverages multi-turn interventions to produce concise reasoning chains. It fine-tunes reasoning models through reinforcement learning, demonstrably breaking the accuracy-efficiency trade-off. It also demonstrates strong scalability, exhibiting excellent performance on 7B models. Code is available at https://github.com/Edric-Zhao/MuTIS/tree/main.- Anthology ID:
- 2025.emnlp-main.690
- Volume:
- Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 13680–13692
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.690/
- DOI:
- Cite (ACL):
- Wenshuo Zhao, Haoxing Zhai, Xinyu Qiu, Zhenting Qi, Shuhe Li, and Linchao Zhu. 2025. MuTIS: Enhancing Reasoning Efficiency through Multi Turn Intervention Sampling in Reinforcement Learning. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 13680–13692, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- MuTIS: Enhancing Reasoning Efficiency through Multi Turn Intervention Sampling in Reinforcement Learning (Zhao et al., EMNLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.690.pdf