A Grounded Preference Model for LLM Alignment
Tahira Naseem, Guangxuan Xu, Sarathkrishna Swaminathan, Asaf Yehudai, Subhajit Chaudhury, Radu Florian, Ramón Astudillo, Asim Munawar
Abstract
Despite LLMs’ recent advancements, they still suffer from factual inconsistency and hallucination. An often-opted remedy is retrieval-augmented generation – however, there is no guarantee that the model will strictly adhere to retrieved grounding. Fundamentally, LLMs need to be aligned to be more faithful to grounding, which will require high-quality preference annotations. This paper investigates whether we can create high-quality grounded preference data for model alignment without using annotations from humans or large proprietary models. We experimented with existing entailment data and proposed approaches to generate synthetic grounded preference data, with which we train a Grounded Preference Model(GPM). We demonstrate through Proximal Policy Optimization(PPO) training of Mistral-7B-Instruct that our GPM model can successfully align powerful LLMs to generate much better grounded responses as judged by GPT4. Moreover, we show that our GPM is also a great faithfulness classifier, achieving SoTA in dialogue sub-tasks of the TRUE faithfulness Benchmark. We will release our GPM under the Apache 2.0 license.- Anthology ID:
- 2024.findings-acl.10
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2024
- Month:
- August
- Year:
- 2024
- Address:
- Bangkok, Thailand
- Editors:
- Lun-Wei Ku, Andre Martins, Vivek Srikumar
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 151–162
- Language:
- URL:
- https://aclanthology.org/2024.findings-acl.10
- DOI:
- 10.18653/v1/2024.findings-acl.10
- Cite (ACL):
- Tahira Naseem, Guangxuan Xu, Sarathkrishna Swaminathan, Asaf Yehudai, Subhajit Chaudhury, Radu Florian, Ramón Astudillo, and Asim Munawar. 2024. A Grounded Preference Model for LLM Alignment. In Findings of the Association for Computational Linguistics: ACL 2024, pages 151–162, Bangkok, Thailand. Association for Computational Linguistics.
- Cite (Informal):
- A Grounded Preference Model for LLM Alignment (Naseem et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/autopr/2024.findings-acl.10.pdf