Policy-based Reinforcement Learning for Generalisation in Interactive Text-based Environments

Edan Toledo, Jan Buys, Jonathan Shock


Abstract
Text-based environments enable RL agents to learn to converse and perform interactive tasks through natural language. However, previous RL approaches applied to text-based environments show poor performance when evaluated on unseen games. This paper investigates the improvement of generalisation performance through the simple switch from a value-based update method to a policy-based one, within text-based environments. We show that by replacing commonly used value-based methods with REINFORCE with baseline, a far more general agent is produced. The policy-based agent is evaluated on Coin Collector and Question Answering with interactive text (QAit), two text-based environments designed to test zero-shot performance. We see substantial improvements on a variety of zero-shot evaluation experiments, including tripling accuracy on various QAit benchmark configurations. The results indicate that policy-based RL has significantly better generalisation capabilities than value-based methods within such text-based environments, suggesting that RL agents could be applied to more complex natural language environments.
Anthology ID:
2023.eacl-main.88
Volume:
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Andreas Vlachos, Isabelle Augenstein
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1230–1242
Language:
URL:
https://aclanthology.org/2023.eacl-main.88
DOI:
10.18653/v1/2023.eacl-main.88
Bibkey:
Cite (ACL):
Edan Toledo, Jan Buys, and Jonathan Shock. 2023. Policy-based Reinforcement Learning for Generalisation in Interactive Text-based Environments. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 1230–1242, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Policy-based Reinforcement Learning for Generalisation in Interactive Text-based Environments (Toledo et al., EACL 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/add_acl24_videos/2023.eacl-main.88.pdf
Video:
 https://preview.aclanthology.org/add_acl24_videos/2023.eacl-main.88.mp4