Consultant Decoding: Yet Another Synergistic Mechanism

Chuanghao Ding; Jiaping Wang; Ziqing Yang; Xiaoliang Wang; Dahua Lin; Nguyen Cam-Tu; Fei Tan

Consultant Decoding: Yet Another Synergistic Mechanism

Chuanghao Ding, Jiaping Wang, Ziqing Yang, Xiaoliang Wang, Dahua Lin, Nguyen Cam-Tu, Fei Tan

Abstract

The synergistic mechanism based on Speculative Decoding (SD) has garnered considerable attention as a simple yet effective approach for accelerating the inference of large language models (LLMs). Nonetheless, the high rejection rates require repeated LLMs calls to validate draft tokens, undermining the overall efficiency gain of SD.In this work, we revisit existing verification mechanisms and propose a novel synergetic mechanism Consultant Decoding (CD). CD achieves up to a 2.5-fold increase in inference speed compared to the target model, while maintaining comparable generation quality (~100% of the target model’s performance). Interestingly, this is achieved by combining models whose parameter sizes differ by two orders of magnitude.In addition, CD reduces the call frequency of the large target model to below 10%, particularly in more demanding tasks.CD’s performance was even found to surpass that of the large target model, which theoretically represents the upper bound for speculative decoding.

Anthology ID:: 2025.findings-acl.797
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15438–15452
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.findings-acl.797/
DOI:
Bibkey:
Cite (ACL):: Chuanghao Ding, Jiaping Wang, Ziqing Yang, Xiaoliang Wang, Dahua Lin, Nguyen Cam-Tu, and Fei Tan. 2025. Consultant Decoding: Yet Another Synergistic Mechanism. In Findings of the Association for Computational Linguistics: ACL 2025, pages 15438–15452, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Consultant Decoding: Yet Another Synergistic Mechanism (Ding et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.findings-acl.797.pdf

PDF Cite Search Fix data