Learning Diverse Responses with Prefix-Conditioned Supervised Fine-Tuning

Zhiyuan Fan; Guanqiao Chen; Yanyi Huang; Mingkuan Zhao; Dadi Guo; Yi R. Fung

Learning Diverse Responses with Prefix-Conditioned Supervised Fine-Tuning

Zhiyuan Fan, Guanqiao Chen, Yanyi Huang, Mingkuan Zhao, Dadi Guo, Yi R. Fung

Abstract

Large language models (LLMs) have shown strong performance on hard reasoning and general instruction-following tasks. However, when sampling multiple outputs for the same prompt, they often produce highly homogeneous, repetitive responses, resulting in inefficient exploration. This limits the gains from test-time scaling and constrains the upper bound of RL training. We attribute this issue in part to supervised fine-tuning (SFT): when a single prompt is paired with multiple reference responses, the model is trained to generate diverse outputs under the same prior condition, which induces optimization interference and can lead to diversity collapse. To address this, we propose Prefix-Conditioned SFT (P-SFT), a simple yet effective method that constructs semantically consistent yet distributionally distinct prior contents to different responses, thereby projecting the instruction into distinct latent regions to establish diverse prior distributions and decouple the one-to-many mapping. Experiments on large reasoning language models show that our approach improves absolute performance by 5.3% and increases generation diversity by 198.3% on average, while substantially enhancing output diversity and test-time scaling. Notably, even without any additional training, our prefixing strategy can be applied at inference time alone and still yields significant gains in both diversity and reasoning performance for instruction-tuned LLMs and reasoning-enhanced models.

Anthology ID:: 2026.acl-long.9
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 247–276
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.9/
DOI:
Bibkey:
Cite (ACL):: Zhiyuan Fan, Guanqiao Chen, Yanyi Huang, Mingkuan Zhao, Dadi Guo, and Yi R. Fung. 2026. Learning Diverse Responses with Prefix-Conditioned Supervised Fine-Tuning. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 247–276, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Learning Diverse Responses with Prefix-Conditioned Supervised Fine-Tuning (Fan et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.9.pdf
Checklist:: 2026.acl-long.9.checklist.pdf

PDF Cite Search Checklist Fix data