CoIR: A Comprehensive Benchmark for Code Information Retrieval Models

Xiangyang Li; Kuicai Dong; Yi Quan Lee; Wei Xia; Hao Zhang; Xinyi Dai; Yasheng Wang; Ruiming Tang

CoIR: A Comprehensive Benchmark for Code Information Retrieval Models

Xiangyang Li, Kuicai Dong, Yi Quan Lee, Wei Xia, Hao Zhang, Xinyi Dai, Yasheng Wang, Ruiming Tang

Abstract

Despite the substantial success of Information Retrieval (IR) in various NLP tasks, most IR systems predominantly handle queries and corpora in natural language, neglecting the domain of code retrieval. Code retrieval is critically important yet remains under-explored, with existing methods and benchmarks inadequately representing the diversity of code in various domains and tasks. Moreover, many models have begun to overfit existing leaderboards, limiting their generalizability and real-world applicability. Addressing this gap, we present CoIR (**Co**de **I**nformation **R**etrieval Benchmark), a robust and comprehensive benchmark specifically designed to assess code retrieval capabilities. CoIR comprises ten meticulously curated code datasets, spanning eight distinctive retrieval tasks across seven diverse domains. We first discuss the construction of CoIR and its diverse dataset composition. Further, we evaluate ten widely used retrieval models using CoIR, uncovering significant difficulties in performing code retrieval tasks even with state-of-the-art systems. CoIR also introduces a simple yet effective python framework, which additionally defines various advanced modes to facilitate researchers in evaluating their models. It shares the same data schema as other popular benchmarks like MTEB and BEIR, enabling seamless cross-benchmark evaluations. Through CoIR, we aim to invigorate research in the code retrieval domain, providing a versatile benchmarking tool that encourages further development and exploration of code retrieval systems.

Anthology ID:: 2025.acl-long.1072
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 22074–22091
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1072/
DOI:
Bibkey:
Cite (ACL):: Xiangyang Li, Kuicai Dong, Yi Quan Lee, Wei Xia, Hao Zhang, Xinyi Dai, Yasheng Wang, and Ruiming Tang. 2025. CoIR: A Comprehensive Benchmark for Code Information Retrieval Models. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 22074–22091, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: CoIR: A Comprehensive Benchmark for Code Information Retrieval Models (Li et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1072.pdf

PDF Cite Search Fix data