Instantly Learning Preference Alignment via In-context DPO

Feifan Song; Yuxuan Fan; Xin Zhang; Peiyi Wang (王培懿); Houfeng Wang

Instantly Learning Preference Alignment via In-context DPO

Feifan Song, Yuxuan Fan, Xin Zhang, Peiyi Wang, Houfeng Wang

Abstract

Human Preference Alignment (HPA) can assist large language models (LLMs) to generate safe content. Due to the heavy cost of fine-tuning, tuning-free methods have emerged, typically modifying LLM decoding via post-processing. In this paper, we propose a novel and effective approach for HPA in a tuning-free way, named In-Context Direct Preference Optimization (ICDPO). We first rethink the derivation procedures of DPO, based on which we conversely build an instant scorer using the states of the LLM before and after ICL. It enables LLMs to both generate and select the well-aligned response, which is precisely estimated by the aforementioned instant scorer, thereby enhancing the final performance. ICDPO can be further enhanced with a two-stage retriever and an upgraded scorer. Extensive experiments show its effectiveness, particularly in outperforming multiple tuning-free baselines, even competitiveness with SFT and DPO. We also conduct detailed analyses to offer comprehensive insights into ICDPO.

Anthology ID:: 2025.naacl-long.8
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 161–178
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.8/
DOI:
Bibkey:
Cite (ACL):: Feifan Song, Yuxuan Fan, Xin Zhang, Peiyi Wang, and Houfeng Wang. 2025. Instantly Learning Preference Alignment via In-context DPO. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 161–178, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Instantly Learning Preference Alignment via In-context DPO (Song et al., NAACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.8.pdf

PDF Cite Search Fix data