How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective

Teng Xiao; Mingxiao Li; Yige Yuan; Huaisheng Zhu; Chao Cui; Vasant G Honavar

doi:10.18653/v1/2024.emnlp-main.744

How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective

Teng Xiao, Mingxiao Li, Yige Yuan, Huaisheng Zhu, Chao Cui, Vasant G Honavar

Abstract

This paper introduces a novel generalized self-imitation learning GSIL framework, which effectively and efficiently aligns large language models with offline demonstration data. We develop GSIL by deriving a surrogate objective of imitation learning with density ratio estimates, facilitating the use of self-generated data and optimizing the imitation learning objective with simple classification losses. GSIL eliminates the need for complex adversarial training in standard imitation learning, achieving lightweight and efficient fine-tuning for large language models. In addition, GSIL encompasses a family of offline losses parameterized by a general class of convex functions for density ratio estimation and enables a unified view for alignment with demonstration data. Extensive experiments show that GSIL consistently and significantly outperforms baselines in many challenging benchmarks, such as coding (HuamnEval), mathematical reasoning (GSM8K) and instruction-following benchmark (MT-Bench). Code is public available at https://github.com/tengxiao1/GSIL.

Anthology ID:: 2024.emnlp-main.744
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13413–13426
Language:
URL:: https://preview.aclanthology.org/Add-Cong-Liu-Florida-Atlantic-University-author-id/2024.emnlp-main.744/
DOI:: 10.18653/v1/2024.emnlp-main.744
Bibkey:
Cite (ACL):: Teng Xiao, Mingxiao Li, Yige Yuan, Huaisheng Zhu, Chao Cui, and Vasant G Honavar. 2024. How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 13413–13426, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective (Xiao et al., EMNLP 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/Add-Cong-Liu-Florida-Atlantic-University-author-id/2024.emnlp-main.744.pdf

PDF Cite Search Fix data