Causal Direct Preference Optimization for Language Model Alignment
Uyen Le, Thin Nguyen, Toan Nguyen, Toan Doan, Trung Le, Bac Le
Abstract
Direct Preference Optimization (DPO) is a powerful approach for aligning large language models (LLMs) with human preferences by formulating preference learning as a supervised classification problem over pairwise human-labeled outputs, thereby enabling stable and efficient training. We show that DPO inherits bias from confounders (e.g., topic, style, user objectives) that shape data generation and carry through to training, hindering recovery of true human preferences. We address this from a causal perspective, proposing Causal Direct Preference Optimization (CDPO), a general framework that incorporates causal inference principles to mitigate the influence of confounders and sharpen the signal of genuine human preferences. Our approach preserves the tractability of direct optimization while enhancing robustness to spurious correlations and annotation biases. Empirical evaluations on benchmark datasets show that CDPO surpasses DPO-based baselines by achieving unbiased fine-tuning through causal reasoning, confirming the effectiveness of confounder-aware preference optimization.- Anthology ID:
- 2026.findings-eacl.58
- Volume:
- Findings of the Association for Computational Linguistics: EACL 2026
- Month:
- March
- Year:
- 2026
- Address:
- Rabat, Morocco
- Editors:
- Vera Demberg, Kentaro Inui, Lluís Marquez
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1098–1113
- Language:
- URL:
- https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.58/
- DOI:
- Cite (ACL):
- Uyen Le, Thin Nguyen, Toan Nguyen, Toan Doan, Trung Le, and Bac Le. 2026. Causal Direct Preference Optimization for Language Model Alignment. In Findings of the Association for Computational Linguistics: EACL 2026, pages 1098–1113, Rabat, Morocco. Association for Computational Linguistics.
- Cite (Informal):
- Causal Direct Preference Optimization for Language Model Alignment (Le et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.58.pdf