Causal Direct Preference Optimization for Language Model Alignment

Uyen Le; Thin Nguyen; Toan Nguyen; Toan Doan; Trung Le; Bac Le

Causal Direct Preference Optimization for Language Model Alignment

Uyen Le, Thin Nguyen, Toan Nguyen, Toan Doan, Trung Le, Bac Le

Abstract

Direct Preference Optimization (DPO) is a powerful approach for aligning large language models (LLMs) with human preferences by formulating preference learning as a supervised classification problem over pairwise human-labeled outputs, thereby enabling stable and efficient training. We show that DPO inherits bias from confounders (e.g., topic, style, user objectives) that shape data generation and carry through to training, hindering recovery of true human preferences. We address this from a causal perspective, proposing Causal Direct Preference Optimization (CDPO), a general framework that incorporates causal inference principles to mitigate the influence of confounders and sharpen the signal of genuine human preferences. Our approach preserves the tractability of direct optimization while enhancing robustness to spurious correlations and annotation biases. Empirical evaluations on benchmark datasets show that CDPO surpasses DPO-based baselines by achieving unbiased fine-tuning through causal reasoning, confirming the effectiveness of confounder-aware preference optimization.

Anthology ID:: 2026.findings-eacl.58
Volume:: Findings of the Association for Computational Linguistics: EACL 2026
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1098–1113
Language:
URL:: https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.58/
DOI:
Bibkey:
Cite (ACL):: Uyen Le, Thin Nguyen, Toan Nguyen, Toan Doan, Trung Le, and Bac Le. 2026. Causal Direct Preference Optimization for Language Model Alignment. In Findings of the Association for Computational Linguistics: EACL 2026, pages 1098–1113, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Causal Direct Preference Optimization for Language Model Alignment (Le et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.58.pdf
Checklist:: 2026.findings-eacl.58.checklist.pdf

PDF Cite Search Checklist Fix data