MDPO: Customized Direct Preference Optimization with a Metric-based Sampler for Question and Answer Generation

Yihang Wang, Bowen Tian, Yueyang Su, Yixing Fan, Jiafeng Guo


Abstract
With the extensive use of large language models, automatically generating QA datasets for domain-specific fine-tuning has become crucial. However, considering the multifaceted demands for readability, diversity, and comprehensiveness of QA data, current methodologies fall short in producing high-quality QA datasets. Moreover, the dependence of existing evaluation metrics on ground truth labels further exacerbates the challenges associated with the selection of QA data. In this paper, we introduce a novel method for QA data generation, denoted as MDPO. We proposes a set of unsupervised evaluation metrics for QA data, enabling multidimensional assessment based on the relationships among context,question and answer. Furthermore, leveraging these metrics, we implement a customized direct preference optimization process that guides large language models to produce high-quality and domain-specific QA pairs. Empirical results on public datasets indicate that MDPO’s performance substantially surpasses that of state-of-the-art methods.
Anthology ID:
2025.coling-main.711
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10660–10671
Language:
URL:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2025.coling-main.711/
DOI:
Bibkey:
Cite (ACL):
Yihang Wang, Bowen Tian, Yueyang Su, Yixing Fan, and Jiafeng Guo. 2025. MDPO: Customized Direct Preference Optimization with a Metric-based Sampler for Question and Answer Generation. In Proceedings of the 31st International Conference on Computational Linguistics, pages 10660–10671, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
MDPO: Customized Direct Preference Optimization with a Metric-based Sampler for Question and Answer Generation (Wang et al., COLING 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2025.coling-main.711.pdf