Beyond Function-Level Search: Repository-Aware Dual-Encoder Code Retrieval with Adversarial Verification

Aofan Liu, Song Shiyuan, Haoxuan Li, Cehao Yang, Yiyan Qi


Abstract
The escalating complexity of modern codebases has intensified the need for code retrieval systems capable of interpreting cross-component change intents—a capability fundamentally absent in conventional function-level search paradigms. While recent research has improved alignment between queries and code snippets, retrieving contextually relevant code for certain change request remains underexplored. To bridge this gap, we present RepoAlignBench, the first benchmark designed to evaluate repository-level code retrieval for change request-driven scenarios, encompassing 52k columns. The benchmark shifts the paradigm from function-centric retrieval to holistic repository analysis. In addition, we propose ReflectCode, an adversarial reflection-augmented dual-tower architecture featuring disentangled code_encoder and doc_encoder towers. Our framework dynamically integrates syntactic patterns, function dependency, and semantic expansion intent through LLM. Comprehensive evaluations demonstrate that ReflectCode achieves 12.2% Top-5 Accuracy and 7.1% Recall improvements over state-of-the-art baselines.
Anthology ID:
2025.findings-emnlp.1147
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
21034–21049
Language:
URL:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.1147/
DOI:
10.18653/v1/2025.findings-emnlp.1147
Bibkey:
Cite (ACL):
Aofan Liu, Song Shiyuan, Haoxuan Li, Cehao Yang, and Yiyan Qi. 2025. Beyond Function-Level Search: Repository-Aware Dual-Encoder Code Retrieval with Adversarial Verification. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 21034–21049, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Beyond Function-Level Search: Repository-Aware Dual-Encoder Code Retrieval with Adversarial Verification (Liu et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.1147.pdf
Checklist:
 2025.findings-emnlp.1147.checklist.pdf