Towards Domain-Independent Text Structuring Trainable on Large Discourse Treebanks

Grigorii Guz, Giuseppe Carenini


Abstract
Text structuring is a fundamental step in NLG, especially when generating multi-sentential text. With the goal of fostering more general and data-driven approaches to text structuring, we propose the new and domain-independent NLG task of structuring and ordering a (possibly large) set of EDUs. We then present a solution for this task that combines neural dependency tree induction with pointer networks, and can be trained on large discourse treebanks that have only recently become available. Further, we propose a new evaluation metric that is arguably more suitable for our new task compared to existing content ordering metrics. Finally, we empirically show that our approach outperforms competitive alternatives on the proposed measure and is equivalent in performance with respect to previously established measures.
Anthology ID:
2020.findings-emnlp.281
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2020
Month:
November
Year:
2020
Address:
Online
Editors:
Trevor Cohn, Yulan He, Yang Liu
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3141–3152
Language:
URL:
https://preview.aclanthology.org/build-pipeline-with-new-library/2020.findings-emnlp.281/
DOI:
10.18653/v1/2020.findings-emnlp.281
Bibkey:
Cite (ACL):
Grigorii Guz and Giuseppe Carenini. 2020. Towards Domain-Independent Text Structuring Trainable on Large Discourse Treebanks. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3141–3152, Online. Association for Computational Linguistics.
Cite (Informal):
Towards Domain-Independent Text Structuring Trainable on Large Discourse Treebanks (Guz & Carenini, Findings 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/build-pipeline-with-new-library/2020.findings-emnlp.281.pdf