Kun Zeng
2026
RepoShapley: Shapley-Enhanced Context Filtering for Repository-Level Code Completion
Yu Huo | Kun Zeng | Siyu Zhang | Yuquan LU | Cheng Yang | Yifu Guo | Xiaoying Tang
Findings of the Association for Computational Linguistics: ACL 2026
Yu Huo | Kun Zeng | Siyu Zhang | Yuquan LU | Cheng Yang | Yifu Guo | Xiaoying Tang
Findings of the Association for Computational Linguistics: ACL 2026
Repository-level code completion benefits from retrieval-augmented generation (RAG). However, controlling cross-file evidence is difficult because chunk utility is often interaction-dependent: some snippets help only when paired with complementary context, while others harm decoding when they conflict. We propose RepoShapley, a coalition-aware context filtering framework supervised by Shapley-style marginal contributions. Our offline labeling module, ChunkShapley, estimates signed per-chunk effects via teacher-forced probing, feeds them into a lightweight surrogate game that captures saturation and interference, computes exact Shapley values for small retrieval sets, and selects a decoding-optimal coalition through bounded post-verification with the frozen generator. The verified <KEEP> / <DROP> decisions and retrieval triggers are then distilled into a single model via discrete control tokens. Experiments across benchmarks and backbones show that RepoShapley improves completion quality while reducing harmful context and unnecessary retrieval.
2025
LLM-Enhanced Query Generation and Retrieval Preservation for Task-Oriented Dialogue
Jiale Chen | Xuelian Dong | Wenxiu Xie | Ru Peng | Kun Zeng | Tianyong Hao
Findings of the Association for Computational Linguistics: ACL 2025
Jiale Chen | Xuelian Dong | Wenxiu Xie | Ru Peng | Kun Zeng | Tianyong Hao
Findings of the Association for Computational Linguistics: ACL 2025
Knowledge retrieval and response generation are fundamental to task-oriented dialogue systems. However, dialogue context frequently contains noisy or irrelevant information, leading to sub-optimal result in knowledge retrieval. One possible approach to retrieving knowledge is to manually annotate standard queries for each dialogue. Yet, this approach is hindered by the challenge of data scarcity, as human annotation is costly. To solve the challenge, we propose an LLM-enhanced model of query-guided knowledge retrieval for task-oriented dialogue. It generates high-quality queries for knowledge retrieval in task-oriented dialogue solely using low-resource annotated queries. To strengthen the performance correlation between response generation and knowledge retrieval, we propose a retrieval preservation mechanism by further selecting the most relevant knowledge from retrieved top-K records and explicitly incorporating these as prompts to guide a generator in response generation. Experiments on three standard benchmarks demonstrate that our model and mechanism outperform previous state-of-the-art by 3.26% on average with two widely used evaluation metrics.