LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering
Qingfei Zhao, Ruobing Wang, Yukuo Cen, Daren Zha, Shicheng Tan, Yuxiao Dong, Jie Tang
Abstract
Long-Context Question Answering (LCQA), a challenging task, aims to reason over long-context documents to yield accurate answers to questions. Existing long-context Large Language Models (LLMs) for LCQA often struggle with the “lost in the middle” issue. Retrieval-Augmented Generation (RAG) mitigates this issue by providing external factual evidence. However, its chunking strategy disrupts the global long-context information, and its low-quality retrieval in long contexts hinders LLMs from identifying effective factual details due to substantial noise. To this end, we propose LongRAG, a general, dual-perspective, and robust LLM-based RAG system paradigm for LCQA to enhance RAG’s understanding of complex long-context knowledge (i.e., global information and factual details). We design LongRAG as a plug-and-play paradigm, facilitating adaptation to various domains and LLMs. Extensive experiments on three multi-hop datasets demonstrate that LongRAG significantly outperforms long-context LLMs (up by 6.94%), advanced RAG (up by 6.16%), and Vanilla RAG (up by 17.25%). Furthermore, we conduct quantitative ablation studies and multi-dimensional analyses, highlighting the effectiveness of the system’s components and fine-tuning strategies.Data and code are available at [https://github.com/QingFei1/LongRAG](https://github.com/QingFei1/LongRAG).- Anthology ID:
- 2024.emnlp-main.1259
- Volume:
- Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 22600–22632
- Language:
- URL:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.emnlp-main.1259/
- DOI:
- 10.18653/v1/2024.emnlp-main.1259
- Cite (ACL):
- Qingfei Zhao, Ruobing Wang, Yukuo Cen, Daren Zha, Shicheng Tan, Yuxiao Dong, and Jie Tang. 2024. LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 22600–22632, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering (Zhao et al., EMNLP 2024)
- PDF:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.emnlp-main.1259.pdf