Improving Preference Alignment of LLM with Inference-Free Self-Refinement

Fukun Ma; Kaibin Tian; Jieting Xue; Xiaoyi Wang; Ye Ma; Quan Chen; Peng Jiang; Lijie Wen

doi:10.18653/v1/2025.findings-emnlp.1329

Improving Preference Alignment of LLM with Inference-Free Self-Refinement

Fukun Ma, Kaibin Tian, Jieting Xue, Xiaoyi Wang, Ye Ma, Quan Chen, Peng Jiang, Lijie Wen

Abstract

Large language models (LLMs) develop the in-context learning capability through pretraining and instruction tuning, enabling task adaptation without parameter updates. Self-refinement is a manifestation of this capability, which allows LLMs to iteratively refine the output using self-generated feedback. However, empirical observations reveal Inference-Free Self-Refinement (IFSR) in preference alignment: LLMs generate preference-improved output via fixed instructions, requiring no specific feedback, even no initial responses. There are two key components of the IFSR in preference alignment. The refining instruction is a fixed instruction that constrains the output distribution from a preference-semantic perspective. During training, it facilitates joint learning of preference-related semantic representations and data distribution alignment. The pseudo reference response is constructed from paired preference data and serves as a demonstration to guide the output distribution. It mitigates off-policy distributional bias while enhancing token-level preference learning in training. Experiments across multiple datasets demonstrate that incorporating IFSR into preference alignment yields performance improvement over 10%. Further ablation studies reveal additional characteristics and potential principles of IFSR.

Anthology ID:: 2025.findings-emnlp.1329
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 24459–24473
Language:
URL:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.1329/
DOI:: 10.18653/v1/2025.findings-emnlp.1329
Bibkey:
Cite (ACL):: Fukun Ma, Kaibin Tian, Jieting Xue, Xiaoyi Wang, Ye Ma, Quan Chen, Peng Jiang, and Lijie Wen. 2025. Improving Preference Alignment of LLM with Inference-Free Self-Refinement. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 24459–24473, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Improving Preference Alignment of LLM with Inference-Free Self-Refinement (Ma et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.1329.pdf
Checklist:: 2025.findings-emnlp.1329.checklist.pdf

PDF Cite Search Checklist Fix data