DREAM-S: Speculative Decoding with Searchable Drafting and Target-Aware Refinement for Multimodal Generation

Zining Liu, Yunhai Hu, Tianhua Xia, BO Bao, Eric Sather, Vithursan Thangarasa, Sai Qian Zhang


Abstract
Speculative decoding (SD) has proven to be an effective technique for accelerating autoregressive generation in large language models (LLMs), however its application to vision-language models (VLMs) remains relatively unexplored. We propose DREAM-S, a novel SD framework designed specifically for fast and efficient decoding in VLMs. DREAM-S leverages a neural architecture search (NAS) framework with target-aware supernet training to automatically identify both the optimal interaction strategy between the draft and target models, and the most suitable draft model architecture for the underlying hardware implementation platform. DREAM-S additionally incorporates adaptive intermediate feature distillation, guided by attention entropy, to enable efficient draft training. Experiments on a range of well-established VLMs show that DREAM-S achieves up to a 3.85× speedup compared to standard decoding approaches and significantly outperforms existing SD baselines.
Anthology ID:
2026.acl-long.2177
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
47031–47045
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.2177/
DOI:
Bibkey:
Cite (ACL):
Zining Liu, Yunhai Hu, Tianhua Xia, BO Bao, Eric Sather, Vithursan Thangarasa, and Sai Qian Zhang. 2026. DREAM-S: Speculative Decoding with Searchable Drafting and Target-Aware Refinement for Multimodal Generation. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 47031–47045, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
DREAM-S: Speculative Decoding with Searchable Drafting and Target-Aware Refinement for Multimodal Generation (Liu et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.2177.pdf
Checklist:
 2026.acl-long.2177.checklist.pdf