DynamicTOC: Persona-based Table of Contents for Consumption of Long Documents

Himanshu Maheshwari, Nethraa Sivakumar, Shelly Jain, Tanvi Karandikar, Vinay Aggarwal, Navita Goyal, Sumit Shekhar


Abstract
Long documents like contracts, financial documents, etc., are often tedious to read through. Linearly consuming (via scrolling or navigation through default table of content) these documents is time-consuming and challenging. These documents are also authored to be consumed by varied entities (referred to as persona in the paper) interested in only certain parts of the document. In this work, we describe DynamicToC, a dynamic table of content-based navigator, to aid in the task of non-linear, persona-based document consumption. DynamicToC highlights sections of interest in the document as per the aspects relevant to different personas. DynamicToC is augmented with short questions to assist the users in understanding underlying content. This uses a novel deep-reinforcement learning technique to generate questions on these persona-clustered paragraphs. Human and automatic evaluations suggest the efficacy of both end-to-end pipeline and different components of DynamicToC.
Anthology ID:
2022.naacl-main.378
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5133–5143
Language:
URL:
https://aclanthology.org/2022.naacl-main.378
DOI:
10.18653/v1/2022.naacl-main.378
Bibkey:
Cite (ACL):
Himanshu Maheshwari, Nethraa Sivakumar, Shelly Jain, Tanvi Karandikar, Vinay Aggarwal, Navita Goyal, and Sumit Shekhar. 2022. DynamicTOC: Persona-based Table of Contents for Consumption of Long Documents. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5133–5143, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
DynamicTOC: Persona-based Table of Contents for Consumption of Long Documents (Maheshwari et al., NAACL 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/nodalida-main-page/2022.naacl-main.378.pdf
Video:
 https://preview.aclanthology.org/nodalida-main-page/2022.naacl-main.378.mp4
Data
ELI5