ConTReGen: Context-driven Tree-structured Retrieval for Open-domain Long-form Text Generation

Kashob Kumar Roy, Pritom Saha Akash, Kevin Chen-Chuan Chang, Lucian Popa


Abstract
Open-domain long-form text generation requires generating coherent, comprehensive responses that address complex queries with both breadth and depth. This task is challenging due to the need to accurately capture diverse facets of input queries. Existing iterative retrieval-augmented generation (RAG) approaches often struggle to delve deeply into each facet of complex queries and integrate knowledge from various sources effectively. This paper introduces ConTReGen, a novel framework that employs a context-driven, tree-structured retrieval approach to enhance the depth and relevance of retrieved content. ConTReGen integrates a hierarchical, top-down in-depth exploration of query facets with a systematic bottom-up synthesis, ensuring comprehensive coverage and coherent integration of multifaceted information. Extensive experiments on multiple datasets, including LFQA and ODSUM, alongside a newly introduced dataset, ODSUM-WikiHow, demonstrate that ConTReGen outperforms existing state-of-the-art RAG models.
Anthology ID:
2024.findings-emnlp.807
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13773–13784
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2024.findings-emnlp.807/
DOI:
10.18653/v1/2024.findings-emnlp.807
Bibkey:
Cite (ACL):
Kashob Kumar Roy, Pritom Saha Akash, Kevin Chen-Chuan Chang, and Lucian Popa. 2024. ConTReGen: Context-driven Tree-structured Retrieval for Open-domain Long-form Text Generation. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 13773–13784, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
ConTReGen: Context-driven Tree-structured Retrieval for Open-domain Long-form Text Generation (Roy et al., Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2024.findings-emnlp.807.pdf