Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models

Jiatao Li; Xinyu Hu; Xunjian Yin; Xiaojun Wan

Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models

Jiatao Li, Xinyu Hu, Xunjian Yin, Xiaojun Wan

Abstract

The integration of documents generated by LLMs themselves (Self-Docs) alongside retrieved documents has emerged as a promising strategy for retrieval-augmented generation systems. However, previous research primarily focuses on optimizing the use of Self-Docs, with their inherent properties remaining underexplored. To bridge this gap, we first investigate the overall effectiveness of Self-Docs, identifying key factors that shape their contribution to RAG performance (RQ1). Building on these insights, we develop a taxonomy grounded in Systemic Functional Linguistics to compare the influence of various Self-Docs categories (RQ2) and explore strategies for combining them with external sources (RQ3). Our findings reveal which types of Self-Docs are most beneficial and offer practical guidelines for leveraging them to achieve significant improvements in knowledge-intensive question answering tasks.

Anthology ID:: 2025.findings-naacl.149
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2741–2775
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.149/
DOI:
Bibkey:
Cite (ACL):: Jiatao Li, Xinyu Hu, Xunjian Yin, and Xiaojun Wan. 2025. Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 2741–2775, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models (Li et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.149.pdf

PDF Cite Search Fix data