RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions

Wanlong Liu; Junying Chen; Ke Ji; Li Zhou; Wenyu Chen; Benyou Wang

RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions

Wanlong Liu, Junying Chen, Ke Ji, Li Zhou, Wenyu Chen, Benyou Wang

Abstract

Retrieval-Augmented Generation (RAG) has emerged as a key paradigm for enhancing large language models by incorporating external knowledge. However, current RAG methods exhibit limited capabilities in complex RAG scenarios and suffer from limited task diversity. To address these limitations, we propose RAG-Instruct, a general method for synthesizing diverse and high-quality RAG instruction data based on any source corpus. Our approach leverages (1) five RAG paradigms, which encompass diverse query-document relationships, and (2) instruction simulation, which enhances instruction diversity and quality by utilizing the strengths of existing instruction datasets. Using this method, we construct a 40K instruction dataset from Wikipedia, comprehensively covering diverse RAG scenarios and tasks. Experiments demonstrate that RAG-Instruct effectively enhances LLMs’ RAG capabilities, achieving strong zero-shot performance and significantly outperforming various RAG baselines across a diverse set of tasks.

Anthology ID:: 2025.emnlp-main.192
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3865–3888
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.192/
DOI:
Bibkey:
Cite (ACL):: Wanlong Liu, Junying Chen, Ke Ji, Li Zhou, Wenyu Chen, and Benyou Wang. 2025. RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 3865–3888, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions (Liu et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.192.pdf
Checklist:: 2025.emnlp-main.192.checklist.pdf

PDF Cite Search Checklist Fix data