Persona-Consistent Dialogue Generation via Pseudo Preference Tuning
Junya Takayama, Masaya Ohagi, Tomoya Mizumoto, Katsumasa Yoshikawa
Abstract
We propose a simple yet effective method for enhancing persona consistency in dialogue response generation using Direct Preference Optimization (DPO). In our method, we generate responses from the response generation model using persona information that has been randomly swapped with data from other dialogues, treating these responses as pseudo-negative samples. The reference responses serve as positive samples, allowing us to create pseudo-preference data. Experimental results demonstrate that our model, fine-tuned with DPO on the pseudo preference data, produces more consistent and natural responses compared to models trained using supervised fine-tuning or reinforcement learning approaches based on entailment relations between personas and utterances.- Anthology ID:
- 2025.coling-main.369
- Volume:
- Proceedings of the 31st International Conference on Computational Linguistics
- Month:
- January
- Year:
- 2025
- Address:
- Abu Dhabi, UAE
- Editors:
- Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
- Venue:
- COLING
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 5507–5514
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2025.coling-main.369/
- DOI:
- Cite (ACL):
- Junya Takayama, Masaya Ohagi, Tomoya Mizumoto, and Katsumasa Yoshikawa. 2025. Persona-Consistent Dialogue Generation via Pseudo Preference Tuning. In Proceedings of the 31st International Conference on Computational Linguistics, pages 5507–5514, Abu Dhabi, UAE. Association for Computational Linguistics.
- Cite (Informal):
- Persona-Consistent Dialogue Generation via Pseudo Preference Tuning (Takayama et al., COLING 2025)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2025.coling-main.369.pdf