Learning Preference Model for LLMs via Automatic Preference Data Generation

Shijia Huang; Jianqiao Zhao; Yanyang Li; Liwei Wang

doi:10.18653/v1/2023.emnlp-main.570

Learning Preference Model for LLMs via Automatic Preference Data Generation

Shijia Huang, Jianqiao Zhao, Yanyang Li, Liwei Wang

Abstract

Despite the advanced capacities of the state-of-the-art large language models (LLMs), they suffer from issues of hallucination, stereotype, etc. Preference models play an important role in LLM alignment, yet training preference models predominantly rely on human-annotated data. This reliance limits their versatility and scalability. In this paper, we propose learning the preference model for LLMs via automatic preference data generation (AutoPM). Our approach involves both In-Breadth Data Generation, which elicits pairwise preference data from LLMs following the helpful-honest-harmless (HHH) criteria, and In-Depth Data Generation, which enriches the dataset with responses spanning a wide quality range. With HHH-guided preference data, our approach simultaneously enables the LLMs to learn human preferences and align with human values. Quantitative assessments on five benchmark datasets demonstrate the reliability and potential of AutoPM, pointing out a more general and scalable way to improve LLM performance.

Anthology ID:: 2023.emnlp-main.570
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9187–9199
Language:
URL:: https://preview.aclanthology.org/add-emnlp-2024-awards/2023.emnlp-main.570/
DOI:: 10.18653/v1/2023.emnlp-main.570
Bibkey:
Cite (ACL):: Shijia Huang, Jianqiao Zhao, Yanyang Li, and Liwei Wang. 2023. Learning Preference Model for LLMs via Automatic Preference Data Generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9187–9199, Singapore. Association for Computational Linguistics.
Cite (Informal):: Learning Preference Model for LLMs via Automatic Preference Data Generation (Huang et al., EMNLP 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/add-emnlp-2024-awards/2023.emnlp-main.570.pdf
Video:: https://preview.aclanthology.org/add-emnlp-2024-awards/2023.emnlp-main.570.mp4

PDF Cite Search Video Fix data