Yanyi Huang
2026
Learning Diverse Responses with Prefix-Conditioned Supervised Fine-Tuning
Zhiyuan Fan | Guanqiao Chen | Yanyi Huang | Mingkuan Zhao | Dadi Guo | Yi R. Fung
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhiyuan Fan | Guanqiao Chen | Yanyi Huang | Mingkuan Zhao | Dadi Guo | Yi R. Fung
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) have shown strong performance on hard reasoning and general instruction-following tasks. However, when sampling multiple outputs for the same prompt, they often produce highly homogeneous, repetitive responses, resulting in inefficient exploration. This limits the gains from test-time scaling and constrains the upper bound of RL training. We attribute this issue in part to supervised fine-tuning (SFT): when a single prompt is paired with multiple reference responses, the model is trained to generate diverse outputs under the same prior condition, which induces optimization interference and can lead to diversity collapse. To address this, we propose Prefix-Conditioned SFT (P-SFT), a simple yet effective method that constructs semantically consistent yet distributionally distinct prior contents to different responses, thereby projecting the instruction into distinct latent regions to establish diverse prior distributions and decouple the one-to-many mapping. Experiments on large reasoning language models show that our approach improves absolute performance by 5.3% and increases generation diversity by 198.3% on average, while substantially enhancing output diversity and test-time scaling. Notably, even without any additional training, our prefixing strategy can be applied at inference time alone and still yields significant gains in both diversity and reasoning performance for instruction-tuned LLMs and reasoning-enhanced models.