Learning to Generalize for Sequential Decision Making

Xusen Yin, Ralph Weischedel, Jonathan May


Abstract
We consider problems of making sequences of decisions to accomplish tasks, interacting via the medium of language. These problems are often tackled with reinforcement learning approaches. We find that these models do not generalize well when applied to novel task domains. However, the large amount of computation necessary to adequately train and explore the search space of sequential decision making, under a reinforcement learning paradigm, precludes the inclusion of large contextualized language models, which might otherwise enable the desired generalization ability. We introduce a teacher-student imitation learning methodology and a means of converting a reinforcement learning model into a natural language understanding model. Together, these methodologies enable the introduction of contextualized language models into the sequential decision making problem space. We show that models can learn faster and generalize more, leveraging both the imitation learning and the reformulation. Our models exceed teacher performance on various held-out decision problems, by up to 7% on in-domain problems and 24% on out-of-domain problems.
Anthology ID:
2020.findings-emnlp.273
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2020
Month:
November
Year:
2020
Address:
Online
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3046–3063
Language:
URL:
https://aclanthology.org/2020.findings-emnlp.273
DOI:
10.18653/v1/2020.findings-emnlp.273
Bibkey:
Cite (ACL):
Xusen Yin, Ralph Weischedel, and Jonathan May. 2020. Learning to Generalize for Sequential Decision Making. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3046–3063, Online. Association for Computational Linguistics.
Cite (Informal):
Learning to Generalize for Sequential Decision Making (Yin et al., Findings 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2020.findings-emnlp.273.pdf
Code
 yinxusen/learning_to_generalize