Jakiro: Boosting Speculative Decoding via Decoupled MoE

Haiduo Huang, Fuwei Yang, Zhenhua Liu, Pengju Ren


Abstract
Speculative decoding has emerged as a promising technique to accelerate large language model inference by employing a smaller draft model to predict multiple tokens, which are then verified in parallel by the larger target model. However, existing approaches face a fundamental limitation: candidates at the same tree layer share identical feature representations, constraining diversity and diminishing overall effectiveness. We identify this as an intra-layer coupling problem that limits prediction accuracy. To address this challenge, we propose Jakiro, which introduces decoupled Mixture of Experts (MoE) into the draft model, enabling different experts to generate diverse candidate tokens from distinct feature spaces. We further propose Contrastive-Enhanced Parallel Decoding (CEPD) that combines autoregressive and parallel decoding with a contrastive mechanism to reduce inference steps while maintaining accuracy. Extensive experiments across diverse models and tasks demonstrate that Jakiro achieves significant speedups over strong baselines, with particularly notable improvements in non-greedy decoding scenarios where token diversity is crucial.
Anthology ID:
2026.acl-long.487
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10649–10668
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.487/
DOI:
Bibkey:
Cite (ACL):
Haiduo Huang, Fuwei Yang, Zhenhua Liu, and Pengju Ren. 2026. Jakiro: Boosting Speculative Decoding via Decoupled MoE. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10649–10668, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Jakiro: Boosting Speculative Decoding via Decoupled MoE (Huang et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.487.pdf
Checklist:
 2026.acl-long.487.checklist.pdf