Bowei Tian


2025

pdf bib
Router-Tuning: A Simple and Effective Approach for Dynamic Depth
Shwai He | Tao Ge | Guoheng Sun | Bowei Tian | Xiaoyang Wang | Dong Yu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

The Mixture of Depths (MoD) was introduced to improve computational efficiency by dynamically skipping less important layers, reducing redundant computation while maintaining model capacity. Despite its promise, existing MoD approaches remain under-explored and face two main challenges: (1) high training costs due to the need to train the entire model along with the routers that determine which layers to skip, and (2) performance degradation when important layers are bypassed. In response to the first issue, we propose Router-Tuning, which fine-tunes only the routers on a small dataset, drastically reducing the computational overhead associated with full model training. For the second challenge, we investigate across different architectures and granularities, demonstrating its effectiveness on Attention layers and MoE layers. This method preserves the model’s performance while significantly enhancing computational and memory efficiency. Extensive experiments demonstrate that our approach delivers competitive results while dramatically improving the computation efficiency, e.g., 21% speedup and only a 0.2% performance drop. The code will be released upon acceptance.