Sem-DPO: Mitigating Semantic Inconsistency in Preference Optimization for Prompt Engineering

Anas Mohamed; Azal Ahmad Khan; Xinran Wang; Ahmad Faraz Khan; Shuwen Ge; Saman Bahzad Khan; Ayaan Ahmad; Ali Anwar

Sem-DPO: Mitigating Semantic Inconsistency in Preference Optimization for Prompt Engineering

Anas Mohamed, Azal Ahmad Khan, Xinran Wang, Ahmad Faraz Khan, Shuwen Ge, Saman Bahzad Khan, Ayaan Ahmad, Ali Anwar

Abstract

Generative AI can now synthesize strikingly realistic images from text, yet output quality remains highly sensitive to how prompts are phrased. Direct Preference Optimization (DPO) offers a lightweight, off-policy alternative to RL for automatic prompt engineering, but its token-level regularization leaves semantic inconsistency unchecked as prompts that win higher preference scores can still drift away from the user’s intended meaning. We introduce Sem-DPO, a variant of DPO that preserves semantic consistency yet retains its simplicity and efficiency. Sem-DPO adjusts the DPO loss using a weight based on how different the winning prompt is from the original, reducing the impact of training examples that are semantically misaligned. We provide the first analytical bound on semantic drift for preference-tuned prompt generators, showing that Sem-DPO keeps learned prompts within a provably bounded neighborhood of the original text. On three standard text-to-image prompt-optimization benchmarks and three language models, Sem-DPO achieves 8–12% higher CLIP similarity and 5–9% higher human-preference scores (HPSv2.1, PickScore) than DPO, while also outperforming state-of-the-art prompt optimization baselines as well as several DPO variants. These findings suggest that strong flat baselines augmented with semantic weighting should become the new standard for prompt-optimization studies and lay the groundwork for broader, semantics-aware preference optimization in language models.

Anthology ID:: 2026.findings-acl.1184
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 23656–23674
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1184/
DOI:
Bibkey:
Cite (ACL):: Anas Mohamed, Azal Ahmad Khan, Xinran Wang, Ahmad Faraz Khan, Shuwen Ge, Saman Bahzad Khan, Ayaan Ahmad, and Ali Anwar. 2026. Sem-DPO: Mitigating Semantic Inconsistency in Preference Optimization for Prompt Engineering. In Findings of the Association for Computational Linguistics: ACL 2026, pages 23656–23674, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Sem-DPO: Mitigating Semantic Inconsistency in Preference Optimization for Prompt Engineering (Mohamed et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1184.pdf
Checklist:: 2026.findings-acl.1184.checklist.pdf

PDF Cite Search Checklist Fix data