When and How to Augment Your Input: Question Routing Helps Balance the Accuracy and Efficiency of Large Language Models

Shufan Chen, He Zheng, Lei Cui


Abstract
Although large language models rely on parametric knowledge to achieve exceptional performance across various question-answering tasks, they still face challenges when addressing knowledge-based long-tail questions. Augmented generation techniques, such as chain-of-thought prompting and retrieval augmentation, can effectively enhance the ability of these models to answer long-tail questions. However, improving accuracy through augmented generation often results in significant latency within question-answering systems. This paper addresses the issue of “when and how to augment the input” by proposing an adaptive question routing framework. This framework employs a query router to select the most appropriate augmentation path at the right time, thereby enhancing both the accuracy and efficiency of question-answering systems. Extensive comparative experiments on benchmarks such as AmbigNQ, HotpotQA, MMLU-STEM, and PopQA demonstrate that our method surpasses existing approaches in both accuracy and efficiency. Furthermore, this paper introduces two metrics for evaluating adaptive question augmentation methods and presents a new benchmark for adaptive question augmentation, aiming to advance the field.
Anthology ID:
2025.findings-naacl.200
Volume:
Findings of the Association for Computational Linguistics: NAACL 2025
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3621–3634
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.200/
DOI:
Bibkey:
Cite (ACL):
Shufan Chen, He Zheng, and Lei Cui. 2025. When and How to Augment Your Input: Question Routing Helps Balance the Accuracy and Efficiency of Large Language Models. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 3621–3634, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
When and How to Augment Your Input: Question Routing Helps Balance the Accuracy and Efficiency of Large Language Models (Chen et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.200.pdf