GRAD: Generalizing RAG Adaptation with Decoding

Youngwon Lee, Seung-won Hwang, Zhewei Yao, Yuxiong He


Abstract
Retrieval-augmented generation needs generation to follow retrieved evidence across shifting domains and prompt layouts, but training a new stronger model per task is costly. To this end, we propose GRAD, an adaptive decoding-time framework that keeps the base generator fixed and composes small, objective-specific guidance at inference. A key advantage of this design is enabling mix and match diverse RAG objectives: model scaling (MS), domain adaptation (DA) and positional debiasing (DB) can be integrated as token-level guidance terms, and new objectives can be easily plugged in. Across public benchmarks and private settings with no in-domain labels, GRAD improves accuracy with favorable latency, offering strong trade-offs versus scaling while reliably activating helpful objectives and suppressing harmful ones, adaptively to tasks.
Anthology ID:
2026.acl-long.2099
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
45274–45290
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.2099/
DOI:
Bibkey:
Cite (ACL):
Youngwon Lee, Seung-won Hwang, Zhewei Yao, and Yuxiong He. 2026. GRAD: Generalizing RAG Adaptation with Decoding. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 45274–45290, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
GRAD: Generalizing RAG Adaptation with Decoding (Lee et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.2099.pdf
Checklist:
 2026.acl-long.2099.checklist.pdf