Siyu Zhang
2026
RepoShapley: Shapley-Enhanced Context Filtering for Repository-Level Code Completion
Yu Huo | Kun Zeng | Siyu Zhang | Yuquan LU | Cheng Yang | Yifu Guo | Xiaoying Tang
Findings of the Association for Computational Linguistics: ACL 2026
Yu Huo | Kun Zeng | Siyu Zhang | Yuquan LU | Cheng Yang | Yifu Guo | Xiaoying Tang
Findings of the Association for Computational Linguistics: ACL 2026
Repository-level code completion benefits from retrieval-augmented generation (RAG). However, controlling cross-file evidence is difficult because chunk utility is often interaction-dependent: some snippets help only when paired with complementary context, while others harm decoding when they conflict. We propose RepoShapley, a coalition-aware context filtering framework supervised by Shapley-style marginal contributions. Our offline labeling module, ChunkShapley, estimates signed per-chunk effects via teacher-forced probing, feeds them into a lightweight surrogate game that captures saturation and interference, computes exact Shapley values for small retrieval sets, and selects a decoding-optimal coalition through bounded post-verification with the frozen generator. The verified <KEEP> / <DROP> decisions and retrieval triggers are then distilled into a single model via discrete control tokens. Experiments across benchmarks and backbones show that RepoShapley improves completion quality while reducing harmful context and unnecessary retrieval.
From Experience to Skill: Multi-Agent Generative Engine Optimization via Reusable Strategy Learning
Beining Wu | Fuyou Mao | Jiong Lin | Cheng Yang | Jiaxuan Lu | Yifu Guo | Siyu Zhang | Yifan Wu | Ying Huang | Fu Li
Findings of the Association for Computational Linguistics: ACL 2026
Beining Wu | Fuyou Mao | Jiong Lin | Cheng Yang | Jiaxuan Lu | Yifu Guo | Siyu Zhang | Yifan Wu | Ying Huang | Fu Li
Findings of the Association for Computational Linguistics: ACL 2026
Generative engines (GEs) are reshaping information access by replacing ranked links with citation-grounded answers, yet current Generative Engine Optimization (GEO) methods optimize each instance in isolation, unable to accumulate or transfer effective strategies across tasks and engines. We reframe GEO as a strategy learning problem and propose MAGEO, a multi-agent framework in which coordinated planning, editing, and fidelity-aware evaluation serve as the execution layer, while validated editing patterns are progressively distilled into reusable, engine-specific optimization skills. To enable controlled assessment, we introduce a Twin Branch Evaluation Protocol for causal attribution of content edits and DSV-CF, a dual-axis metric that unifies semantic visibility with attribution accuracy. We further release MSME-GEO-Bench, a multi-scenario, multi-engine benchmark grounded in real-world queries. Experiments on three mainstream engines show that MAGEO substantially outperforms heuristic baselines in both visibility and citation fidelity, with ablations confirming that engine-specific preference modeling and strategy reuse are central to these gains, suggesting a scalable learning-driven paradigm for trustworthy GEO. Code is available at https://github.com/Wu-beining/MAGEO.