Ke Wang
Renmin
Other people with similar names: Ke Wang, Ke Wang, Ke Wang, Ke Wang
Unverified author pages with similar names: Ke Wang
2026
MTP-RL: Acceleration of Reinforcement Learning Rollouts with Policy-Aligned Multi-Token Prediction
Ke Wang | Aohan Zeng | Zhengxiao Du | Yuxuan Hu | Bohan Zhang | Xinyi Wang | Jie Tang | Jing Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Ke Wang | Aohan Zeng | Zhengxiao Du | Yuxuan Hu | Bohan Zhang | Xinyi Wang | Jie Tang | Jing Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Reinforcement learning (RL) is widely applied to boost the performance of pretrained models, yet its training efficiency is severely constrained by rollout generation. While speculative decoding based on multi-token prediction (MTP) offers a potential acceleration pathway, its widespread adoption is hindered by the absence of MTP in vanilla pretrained models and the rapid degradation of the MTP acceptance length in RL training. To address these issues, this paper proposes MTP-RL, a two-stage framework that pioneers effective training of MTPs in RL and accelerates the rollout phase for diverse models. It involves a pipeline to equip the multi-layer parameter-sharing MTP for all models and an innovative advantage-aware MTP optimization strategy to facilitate policy-aligned training of MTPs. Experiments demonstrate that our method not only achieves stable growth of acceptance length during RL training, but also accelerates RL rollouts, achieving an average 23.1%–55.3% reduction in rollout time compared to baselines.
Sparse-RL: Breaking the Memory Wall in LLM Reinforcement Learning via Stable Sparse Rollouts
Sijia Luo | Xiaokang Zhang | Yuxuan Hu | Bohan Zhang | Ke Wang | Jinbo Su | Mengshu Sun | Lei Liang | Jing Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Sijia Luo | Xiaokang Zhang | Yuxuan Hu | Bohan Zhang | Ke Wang | Jinbo Su | Mengshu Sun | Lei Liang | Jing Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reinforcement Learning (RL) has become essential for eliciting complex reasoning capabilities in Large Language Models (LLMs). However, the substantial memory overhead of storing Key-Value (KV) caches during long-horizon rollouts acts as a critical bottleneck, often prohibiting efficient training on limited hardware. While existing KV compression techniques offer a remedy for inference, directly applying them to RL training induces a severe policy mismatch, leading to catastrophic performance collapse. To address this, we introduce Sparse-RL, which empowers stable RL training under sparse rollouts. We show that instability arises from a fundamental policy mismatch among the dense old policy, the sparse sampler policy, and the learner policy. To mitigate this issue, Sparse-RL incorporates Sparsity-Aware Rejection Sampling and Importance-based Reweighting to correct the off-policy bias introduced by compression-induced information loss. Experimental results show that Sparse-RL reduces rollout overhead compared to dense baselines while preserving the performance. Furthermore, Sparse-RL inherently implements sparsity-aware training, significantly enhancing model robustness during sparse inference deployment.
GUI0: Self-Evolving Foundational GUI Agents in Super App Ecosystems
Xinyi Wang | Wei Dai | Kyle Qiao | Ke Wang | Peng Chen | Gang Cao | Kangqin | Zhongpu Wang | Xiaode Zhang | Yanming Liu | Jihao Gu | Jingtao Xu | Gong Zhi
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xinyi Wang | Wei Dai | Kyle Qiao | Ke Wang | Peng Chen | Gang Cao | Kangqin | Zhongpu Wang | Xiaode Zhang | Yanming Liu | Jihao Gu | Jingtao Xu | Gong Zhi
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Automated interaction with graphical user interfaces (GUIs) is central to General Artificial Intelligence yet remains challenging within Super App ecosystems, characterized by non-standard rendering and absent accessibility metadata. While GUI agents often rely on explicit accessibility trees or static imitation, they are less explored for dynamic environments marked by sparse feedback and implicit visual cues. We present GUI0, a framework synergizing autonomous data synthesis with dual-agent co-evolution. GUI0 establishes a domain-aware foundation model via synthesized corpora and employs curriculum-driven reinforcement learning, where a curriculum agent generates boundary tasks to optimize an actor agent.Empirical results demonstrate three key advantages: (1) State-of-the-art performance on the SuperAPP benchmark, outperforming Gemini-2.5-Pro and Claude-4-Sonnet; (2) universal efficacy across diverse base models, consistently yielding substantial improvements on both Qwen2.5-VL and GUI-Owl variants; and (3) robust zero-shot generalization to standard GUIs (e.g., +62.7% on ScreenSpot Pro).
2025
SAM Decoding: Speculative Decoding via Suffix Automaton
Yuxuan Hu | Ke Wang | Xiaokang Zhang | Fanjin Zhang | Cuiping Li | Hong Chen | Jing Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yuxuan Hu | Ke Wang | Xiaokang Zhang | Fanjin Zhang | Cuiping Li | Hong Chen | Jing Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Speculative decoding (SD) has been demonstrated as an effective technique for lossless LLM inference acceleration. Retrieval-based SD methods, one kind of model-free method, have yielded promising speedup, but they often rely on single retrieval resources, inefficient retrieval methods, and are constrained to certain tasks. This paper presents a novel retrieval-based speculative decoding method that adapts the suffix automaton (SAM) for efficient and accurate draft generation by utilizing the generating text sequence and static text corpus. Unlike existing n-gram matching methods, SAM-Decoding finds the exact longest suffix match, achieving an average time complexity of O(1) per generation step of SAM update and suffix retrieval.It can also integrate with existing methods, adaptively selecting a draft generation strategy based on match length to generalize to broader domains. Extensive experiments on Spec-Bench show that our method is 18% faster than other retrieval-based SD methods. Additionally, when combined with advanced EAGLE-2, it provides an additional speedup of 3.28% – 11.13% across various-sized LLM backbones.
RACQC: Advanced Retrieval-Augmented Generation for Chinese Query Correction
Jinbo Su | Lingzhe Gao | Wei Li | Shihao Liu | Haojie Lei | Xinyi Wang | Yuanzhao Guo | Ke Wang | Daiting Shi | Dawei Yin
Findings of the Association for Computational Linguistics: EMNLP 2025
Jinbo Su | Lingzhe Gao | Wei Li | Shihao Liu | Haojie Lei | Xinyi Wang | Yuanzhao Guo | Ke Wang | Daiting Shi | Dawei Yin
Findings of the Association for Computational Linguistics: EMNLP 2025
In web search scenarios, erroneous queries frequently degrade users’ experience through irrelevant results, underscoring the pivotal role of Chinese Spelling Check (CSC) systems. Although large language models (LLMs) exhibit remarkable capabilities across many tasks, they face critical challenges in the CSC scenario: (1) poor generalization to rare entities in open-domain searches, and (2) failure to adapt to temporal entity variations due to static parameters, resulting in serious over-correction issues. To tackle this, we present RACQC, a Chinese Query Correction system with Retrieval-Augmented Generation (RAG) and multi-task learning. Specifically, our approach (1) integrates dynamic knowledge retrieval through entity-centric RAG to address rare entities and innovatively proposes an entity-title collaborative corpus, and (2) employs contrastive correction tasks to mitigate LLM over-correction tendencies. Furthermore, we propose MDCQC, a Multi-Domain Chinese Query Correction benchmark to test the model’s entity correction capabilities. Extensive experiments on several datasets show that RACQC significantly outperforms existing baselines in CSC tasks. Specifically, RACQC achieves a maximum improvement of +9.92% on the search scenario benchmark and +3.2% on the general-domain dataset under the F1 metric.
Search
Fix author
Co-authors
- Yuxuan Hu 3
- Xinyi Wang 3
- Jinbo Su 2
- Bohan Zhang 2
- Jing Zhang 2
- Xiaokang Zhang 2
- Gang Cao 1
- Hong Chen 1
- Peng Chen 1
- Wei Dai 1
- Zhengxiao Du 1
- Lingzhe Gao 1
- Jihao Gu 1
- Yuanzhao Guo 1
- Kangqin 1
- Haojie Lei 1
- Cuiping Li 1
- Wei Li 1
- Lei Liang 1
- Yanming Liu 1
- Shihao Liu 1
- Sijia Luo 1
- Kyle Qiao 1
- Daiting Shi 1
- Mengshu Sun 1
- Jie Tang 1
- Zhongpu Wang 1
- Jingtao Xu 1
- Dawei Yin 1
- Aohan Zeng 1
- Fanjin Zhang 1
- Jing Zhang 1
- Xiaode Zhang 1
- Gong Zhi 1