Abstract
Recent work in neural generation has attracted significant interest in controlling the form of text, such as style, persona, and politeness. However, there has been less work on controlling neural text generation for content. This paper introduces the notion of Content Transfer for long-form text generation, where the task is to generate a next sentence in a document that both fits its context and is grounded in a content-rich external textual source such as a news story. Our experiments on Wikipedia data show significant improvements against competitive baselines. As another contribution of this paper, we release a benchmark dataset of 640k Wikipedia referenced sentences paired with the source articles to encourage exploration of this new task.- Anthology ID:
- N19-1269
- Volume:
- Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
- Month:
- June
- Year:
- 2019
- Address:
- Minneapolis, Minnesota
- Editors:
- Jill Burstein, Christy Doran, Thamar Solorio
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2622–2632
- Language:
- URL:
- https://aclanthology.org/N19-1269
- DOI:
- 10.18653/v1/N19-1269
- Cite (ACL):
- Shrimai Prabhumoye, Chris Quirk, and Michel Galley. 2019. Towards Content Transfer through Grounded Text Generation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2622–2632, Minneapolis, Minnesota. Association for Computational Linguistics.
- Cite (Informal):
- Towards Content Transfer through Grounded Text Generation (Prabhumoye et al., NAACL 2019)
- PDF:
- https://preview.aclanthology.org/naacl-24-ws-corrections/N19-1269.pdf