@inproceedings{toledo-etal-2023-policy,
    title = "Policy-based Reinforcement Learning for Generalisation in Interactive Text-based Environments",
    author = "Toledo, Edan  and
      Buys, Jan  and
      Shock, Jonathan",
    editor = "Vlachos, Andreas  and
      Augenstein, Isabelle",
    booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics",
    month = may,
    year = "2023",
    address = "Dubrovnik, Croatia",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2023.eacl-main.88/",
    doi = "10.18653/v1/2023.eacl-main.88",
    pages = "1230--1242",
    abstract = "Text-based environments enable RL agents to learn to converse and perform interactive tasks through natural language. However, previous RL approaches applied to text-based environments show poor performance when evaluated on unseen games. This paper investigates the improvement of generalisation performance through the simple switch from a value-based update method to a policy-based one, within text-based environments. We show that by replacing commonly used value-based methods with REINFORCE with baseline, a far more general agent is produced. The policy-based agent is evaluated on Coin Collector and Question Answering with interactive text (QAit), two text-based environments designed to test zero-shot performance. We see substantial improvements on a variety of zero-shot evaluation experiments, including tripling accuracy on various QAit benchmark configurations. The results indicate that policy-based RL has significantly better generalisation capabilities than value-based methods within such text-based environments, suggesting that RL agents could be applied to more complex natural language environments."
}Markdown (Informal)
[Policy-based Reinforcement Learning for Generalisation in Interactive Text-based Environments](https://preview.aclanthology.org/ingest-emnlp/2023.eacl-main.88/) (Toledo et al., EACL 2023)
ACL