Ken Deng
2026
ShopSimulator: Evaluating and Exploring RL-Driven LLM Agent for Shopping Assistants
Pei Wang | Yanan Wu | Xiaoshuai Song | Weixun Wang | Gengru Chen | Zhongwen Li | Kezhong Yan | Qi Liu | Ken Deng | Shuaibing Zhao | Shaopan Xiong | Xuepeng Liu | Xuefeng Chen | Wanxi Deng | Wenbo Su | Bo Zheng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Pei Wang | Yanan Wu | Xiaoshuai Song | Weixun Wang | Gengru Chen | Zhongwen Li | Kezhong Yan | Qi Liu | Ken Deng | Shuaibing Zhao | Shaopan Xiong | Xuepeng Liu | Xuefeng Chen | Wanxi Deng | Wenbo Su | Bo Zheng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language model (LLM)-based agents are increasingly deployed in e-commerce shopping. To perform thorough, user-tailored product searches, agents should interpret personal preferences, engage in multi-turn dialogues, and ultimately retrieve and discriminate among highly similar products. However, existing research has yet to provide a unified simulation environment that consistently captures all of these aspects, and always focuses solely on evaluation benchmarks without training support. In this paper, we introduce ShopSimulator, a large-scale and challenging Chinese shopping environment. Leveraging ShopSimulator, we evaluate LLMs across diverse scenarios, finding that even the best-performing models achieve less than 40% full-success rate. Error analysis reveals that agents struggle with deep search and product selection in long trajectories, fail to balance the use of personalization cues, and to effectively engage with users. Further training exploration provides practical guidance for overcoming these weaknesses, with the combination of supervised fine-tuning (SFT) and reinforcement learning (RL) yielding significant performance improvements.
ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding
Yuhang Li | Chenchen Zhang | Ruilin Lv | Ao Liu | Ken Deng | Yuanxing Zhang | Jiaheng Liu | Bo Zhou
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yuhang Li | Chenchen Zhang | Ruilin Lv | Ao Liu | Ken Deng | Yuanxing Zhang | Jiaheng Liu | Bo Zhou
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While Large Language Models (LLMs) excel at algorithmic code generation, they struggle with front-end development, where correctness is judged on rendered pixels and interaction. We present ReLook, an agentic, vision-grounded reinforcement learning framework that empowers an agent to close a robust generate–diagnose–refine loop by invoking a multimodal LLM (MLLM) as a tool. During training, the agent employs an MLLM-in-the-loop to serve as a visual critic, evaluating code via screenshots and providing actionable feedback. Crucially, we enforce a strict zero-reward policy for invalid renders to guarantee renderability and mitigate reward hacking. To prevent behavioral collapse, we introduce Forced Optimization, a strict acceptance rule that admits only improving revisions, yielding monotonically better trajectories. At inference, we decouple the critic and run a lightweight, critic-free self-edit cycle, keeping latency comparable to base decoding while retaining most of the gains. Across three widely used benchmarks, ReLook consistently outperforms strong baselines in vision-grounded front-end code generation, highlighting the benefits of agentic perception, visual rewards, and training–inference decoupling.
2025
M2RC-EVAL: Massively Multilingual Repository-level Code Completion Evaluation
Jiaheng Liu | Ken Deng | Congnan Liu | Jian Yang | Shukai Liu | He Zhu | Peng Zhao | Linzheng Chai | Yanan Wu | JinKe JinKe | Ge Zhang | Zekun Moore Wang | Guoan Zhang | Yingshui Tan | Bangyu Xiang | Zhaoxiang Zhang | Wenbo Su | Bo Zheng
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jiaheng Liu | Ken Deng | Congnan Liu | Jian Yang | Shukai Liu | He Zhu | Peng Zhao | Linzheng Chai | Yanan Wu | JinKe JinKe | Ge Zhang | Zekun Moore Wang | Guoan Zhang | Yingshui Tan | Bangyu Xiang | Zhaoxiang Zhang | Wenbo Su | Bo Zheng
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Repository-level code completion has drawn great attention in software engineering, and several benchmarks have been introduced. However, existing repository-level code completion benchmarks usually focus on a limited number of languages (<5), which cannot evaluate the general code intelligence abilities across different languages for existing code Large Language Models (LLMs). Besides, the existing benchmarks usually report overall average scores of different languages, where the fine-grained abilities in different completion scenarios are ignored. Therefore, to facilitate the research of code LLMs in multilingual scenarios, we propose a massively multilingual repository-level code completion benchmark covering 18 programming languages (called M2RC-EVAL), and two types of fine-grained annotations (i.e., bucket-level and semantic-level) on different completion scenarios are provided, where we obtain these annotations based on the parsed abstract syntax tree. Moreover, we also curate a massively multilingual instruction corpora M2RC-INSTRUCT dataset to improve the repository-level code completion abilities of existing code LLMs. Comprehensive experimental results demonstrate the effectiveness of our M2RC-EVAL and M2RC-INSTRUCT.
Search
Fix author
Co-authors
- Jiaheng Liu 2
- Wenbo Su 2
- Yanan Wu 2
- Bo Zheng 2
- Linzheng Chai 1
- Gengru Chen 1
- Xuefeng Chen 1
- Wanxi Deng 1
- JinKe JinKe 1
- Zhongwen Li 1
- Yuhang Li 1
- Qi Liu 1
- Xuepeng Liu 1
- Congnan Liu 1
- Shukai Liu 1
- Ao Liu 1
- Ruilin Lv 1
- Xiaoshuai Song 1
- Yingshui Tan 1
- Pei Wang 1
- Weixun Wang 1
- Zekun Moore Wang 1
- Bangyu Xiang 1
- Shaopan Xiong 1
- Kezhong Yan 1
- Jian Yang 1
- Ge Zhang 1
- Guoan Zhang 1
- Zhaoxiang Zhang 1
- Chenchen Zhang 1
- Yuanxing Zhang 1
- Shuaibing Zhao 1
- Peng Zhao 1
- Bo Zhou 1
- He Zhu 1
Venues
- ACL3