Nano: Nested Human-in-the-Loop Reward Learning for Few-shot Language Model Control

Xiang Fan, Yiwei Lyu, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency


Abstract
Pretrained language models have demonstrated extraordinary capabilities in language generation. However, real-world tasks often require controlling the distribution of generated text in order to mitigate bias, promote fairness, and achieve personalization. Existing techniques for controlling the distribution of generated text only work with quantified distributions, which require pre-defined categories, proportions of the distribution, or an existing corpus following the desired distributions. However, many important distributions, such as personal preferences, are unquantified. In this work, we tackle the problem of generating text following arbitrary distributions (quantified and unquantified) by proposing NANO, a few-shot human-in-the-loop training algorithm that continuously learns from human feedback. NANO achieves state-of-the-art results on single topic/attribute as well as quantified distribution control compared to previous works. We also show that NANO is able to learn unquantified distributions, achieves personalization, and captures differences between different individuals’ personal preferences with high sample efficiency.
Anthology ID:
2023.findings-acl.758
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11970–11992
Language:
URL:
https://aclanthology.org/2023.findings-acl.758
DOI:
10.18653/v1/2023.findings-acl.758
Bibkey:
Cite (ACL):
Xiang Fan, Yiwei Lyu, Paul Pu Liang, Ruslan Salakhutdinov, and Louis-Philippe Morency. 2023. Nano: Nested Human-in-the-Loop Reward Learning for Few-shot Language Model Control. In Findings of the Association for Computational Linguistics: ACL 2023, pages 11970–11992, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Nano: Nested Human-in-the-Loop Reward Learning for Few-shot Language Model Control (Fan et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/2023.findings-acl.758.pdf
Video:
 https://preview.aclanthology.org/naacl24-info/2023.findings-acl.758.mp4