Jie Sun

2025

The alignment of Large Language Models (LLMs) is crucial for ensuring their safety and reliability in practical applications. Direct Preference Optimization (DPO) has emerged as an efficient method that directly optimizes models using preference pairs, significantly reducing resource demands. However, the effectiveness of DPO heavily depends on the data quality, which is frequently compromised by noise. In this work, we propose 𝛾-PO, a dynamic target margin preference optimization algorithm that adjust reward margins at the pairwise level. By introducing instance-specific margin calibration, 𝛾-PO strategically prioritizes high-confidence pairs (those demonstrating higher reward margins) while suppressing potential noise from ambiguous pairs. Moreover, 𝛾-PO is a plug-and-play method, compatible with variants of DPO that rely on reward margin between preference pairs. Across benchmarks such as AlpacaEval2 and Arena-Hard, 𝛾-PO achieves an average 4.4% improvement over other baselines, setting new benchmarks for state-of-the-art performance. Additionally, 𝛾-PO requires minimal code changes and has a negligible impact on training efficiency, making it a robust solution for enhancing LLMs alignment. Our codes are available at https://github.com/sunjie279/gammaPO.

Auctions are a vital economic mechanism used to determine the market value of goods or services through competitive bidding within a specific framework. However, much of the current research primarily focuses on the bidding algorithms used within auction mechanisms. This often neglects the potential benefits of incorporating individual users’ unique preferences into the valuation process. Our theoretical and empirical analysis demonstrates that valuation errors can significantly impact the overall utility. To bridge this gap, we propose a personalized valuation framework, namely Large Language Models-powered Personalized Valuation (LaMP-Val), which integrates Large Language Models to incorporate personalized semantic preference into users valuation process. LaMP-Val integrating three components: data, learning, and evaluation. The data component tackles the challenge of building a novel dataset specifically for LLMs fine-tuning in personalized valuation modeling. The learning component introduces a diversity template to enhance LLMs’ capacity for modeling fine-grained personal valuation patterns. The evaluation component establishes a closed-loop system where LLM-generated valuations interact with bidding strategies and auction. It proposes two novel metrics to quantify valuation precision and bidding intention accuracy in personalized scenarios. Extensive experiments show that LaMP-Val more accurately captures personalized values and achieves greater profits than baseline approaches.

Co-authors

Chi Luo 1

Venues

findings2

Fix author