SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization

Dhruv Gupta; Gayathri Ganesh Lakshmy; Yiqing Xie

doi:10.18653/v1/2025.findings-emnlp.1365

SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization

Dhruv Gupta, Gayathri Ganesh Lakshmy, Yiqing Xie

Abstract

In this work, we conduct an in-depth analysis of code retrieval by systematically masking specific features while preserving code functionality. Our discoveries include: (1) although trained on code, current retrievers heavily rely on surface-level textual features (e.g., docstrings, identifier names), and (2) they exhibit a strong bias towards well-documented code, even if the documentation is irrelevant. Based on our discoveries, we propose SACL, a framework that enriches textual information and reduces bias by augmenting code or structural knowledge with semantic information. Extensive experiments show that SACL substantially improves code retrieval (e.g., by 12.8% / 9.4% / 7.0% Recall@1 on HumanEval / MBPP / SWE-Bench-Lite), which also leads to better code generation performance (e.g., by 4.88% Pass@1 on HumanEval).

Anthology ID:: 2025.findings-emnlp.1365
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 25052–25065
Language:
URL:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.1365/
DOI:: 10.18653/v1/2025.findings-emnlp.1365
Bibkey:
Cite (ACL):: Dhruv Gupta, Gayathri Ganesh Lakshmy, and Yiqing Xie. 2025. SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 25052–25065, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization (Gupta et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.1365.pdf
Checklist:: 2025.findings-emnlp.1365.checklist.pdf

PDF Cite Search Checklist Fix data