Abstract
We present an algorithm for structured prediction under online bandit feedback. The learner repeatedly predicts a sequence of actions, generating a structured output. It then observes feedback for that output and no others. We consider two cases: a pure bandit setting in which it only observes a loss, and more fine-grained feedback in which it observes a loss for every action. We find that the fine-grained feedback is necessary for strong empirical performance, because it allows for a robust variance-reduction strategy. We empirically compare a number of different algorithms and exploration methods and show the efficacy of BLS on sequence labeling and dependency parsing tasks.- Anthology ID:
- W17-4304
- Volume:
- Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing
- Month:
- September
- Year:
- 2017
- Address:
- Copenhagen, Denmark
- Editors:
- Kai-Wei Chang, Ming-Wei Chang, Vivek Srikumar, Alexander M. Rush
- Venue:
- WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 17–26
- Language:
- URL:
- https://aclanthology.org/W17-4304
- DOI:
- 10.18653/v1/W17-4304
- Cite (ACL):
- Amr Sharaf and Hal Daumé III. 2017. Structured Prediction via Learning to Search under Bandit Feedback. In Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing, pages 17–26, Copenhagen, Denmark. Association for Computational Linguistics.
- Cite (Informal):
- Structured Prediction via Learning to Search under Bandit Feedback (Sharaf & Daumé III, 2017)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/W17-4304.pdf
- Data
- Penn Treebank