Towards Content Transfer through Grounded Text Generation

Shrimai Prabhumoye, Chris Quirk, Michel Galley


Abstract
Recent work in neural generation has attracted significant interest in controlling the form of text, such as style, persona, and politeness. However, there has been less work on controlling neural text generation for content. This paper introduces the notion of Content Transfer for long-form text generation, where the task is to generate a next sentence in a document that both fits its context and is grounded in a content-rich external textual source such as a news story. Our experiments on Wikipedia data show significant improvements against competitive baselines. As another contribution of this paper, we release a benchmark dataset of 640k Wikipedia referenced sentences paired with the source articles to encourage exploration of this new task.
Anthology ID:
N19-1269
Volume:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Editors:
Jill Burstein, Christy Doran, Thamar Solorio
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2622–2632
Language:
URL:
https://aclanthology.org/N19-1269
DOI:
10.18653/v1/N19-1269
Bibkey:
Cite (ACL):
Shrimai Prabhumoye, Chris Quirk, and Michel Galley. 2019. Towards Content Transfer through Grounded Text Generation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2622–2632, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
Towards Content Transfer through Grounded Text Generation (Prabhumoye et al., NAACL 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/N19-1269.pdf
Video:
 https://preview.aclanthology.org/naacl24-info/N19-1269.mp4