Language Models are Few-Shot Butlers

Vincent Micheli; François Fleuret

doi:10.18653/v1/2021.emnlp-main.734

Language Models are Few-Shot Butlers

Abstract

Pretrained language models demonstrate strong performance in most NLP tasks when fine-tuned on small task-specific datasets. Hence, these autoregressive models constitute ideal agents to operate in text-based environments where language understanding and generative capabilities are essential. Nonetheless, collecting expert demonstrations in such environments is a time-consuming endeavour. We introduce a two-stage procedure to learn from a small set of demonstrations and further improve by interacting with an environment. We show that language models fine-tuned with only 1.2% of the expert demonstrations and a simple reinforcement learning algorithm achieve a 51% absolute improvement in success rate over existing methods in the ALFWorld environment.

Anthology ID:: 2021.emnlp-main.734
Volume:: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2021
Address:: Online and Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9312–9318
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2021.emnlp-main.734/
DOI:: 10.18653/v1/2021.emnlp-main.734
Bibkey:
Cite (ACL):: Vincent Micheli and Francois Fleuret. 2021. Language Models are Few-Shot Butlers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9312–9318, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: Language Models are Few-Shot Butlers (Micheli & Fleuret, EMNLP 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2021.emnlp-main.734.pdf
Video:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2021.emnlp-main.734.mp4
Code: vmicheli/lm-butlers

PDF Cite Search Code Video Fix data