Xiao Zheng
2026
Calibrated Speculative Decoding: Frequency-Guided Candidate Selection for Efficient Inference
Zhouxuwen | Fangxin Liu | Chao Wang | Xiao Zheng | Hao Zheng | Min He | Li Jiang | Haibing Guan
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhouxuwen | Fangxin Liu | Chao Wang | Xiao Zheng | Hao Zheng | Min He | Li Jiang | Haibing Guan
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Speculative decoding accelerates autoregressive generation by letting draft tokens bypass full verification, but conventional frameworks suffer from frequent false rejections, particularly when draft models produce semantically correct but lexically divergent outputs. In this paper, we present Calibrated Speculative Decoding (CSD), a training-free framework that recovers valid tokens discarded by standard verification. Guided by the principle of "Frequency-Guided Candidate Selection and Probability-Guarded Acceptance," CSD incorporates two lightweight modules: Online Correction Memory, which aggregates historical rejections to propose recurring divergence patterns as rescue candidates, and Semantic Consistency Gating, which verifies candidate admissibility using probability ratios instead of exact token matching. Our evaluation across diverse large language models demonstrates that CSD outperforms existing methods, achieving a peak throughput speedup of 2.33x. CSD preserves model accuracy across all tasks while further boosting performance on complex reasoning datasets. These results establish CSD as a highly effective, lightweight solution for practical LLM deployments.
2023
Can Foundation Models Watch, Talk and Guide You Step by Step to Make a Cake?
Yuwei Bao | Keunwoo Yu | Yichi Zhang | Shane Storks | Itamar Bar-Yossef | Alex de la Iglesia | Megan Su | Xiao Zheng | Joyce Chai
Findings of the Association for Computational Linguistics: EMNLP 2023
Yuwei Bao | Keunwoo Yu | Yichi Zhang | Shane Storks | Itamar Bar-Yossef | Alex de la Iglesia | Megan Su | Xiao Zheng | Joyce Chai
Findings of the Association for Computational Linguistics: EMNLP 2023
Despite tremendous advances in AI, it remains a significant challenge to develop interactive task guidance systems that can offer situated, personalized guidance and assist humans in various tasks. These systems need to have a sophisticated understanding of the user as well as the environment, and make timely accurate decisions on when and what to say. To address this issue, we created a new multimodal benchmark dataset, Watch, Talk and Guide (WTaG) based on natural interaction between a human user and a human instructor. We further proposed two tasks: User and Environment Understanding, and Instructor Decision Making. We leveraged several foundation models to study to what extent these models can be quickly adapted to perceptually enabled task guidance. Our quantitative, qualitative, and human evaluation results show that these models can demonstrate fair performances in some cases with no task-specific training, but a fast and reliable adaptation remains a significant challenge. Our benchmark and baselines will provide a stepping stone for future work on situated task guidance.