Xinyan Velocity Yu
2025
CodeRAG-Bench: Can Retrieval Augment Code Generation?
Zora Zhiruo Wang
|
Akari Asai
|
Xinyan Velocity Yu
|
Frank F. Xu
|
Yiqing Xie
|
Graham Neubig
|
Daniel Fried
Findings of the Association for Computational Linguistics: NAACL 2025
While language models (LMs) excel at generating code, many programs are difficult to generate using only parametric knowledge. Despite the success of retrieval-augmented generation (RAG) in text-centric tasks, its potential for code generation remains under-explored. This work introduces CodeRAG-bench, a holistic retrieval-augmented code generation benchmark covering tasks like basic programming, open-domain, and repository-level problems and provides reproducible evaluations on both retrieval and end-to-end code generation performance. We further create a diverse, open datastore for code retrieval, aggregating sources such as competition solutions, tutorials, library documentation, StackOverflow posts, and GitHub repositories. Based on CodeRAG-bench, we conduct large-scale evaluations of 10 retrievers and 10 LMs and systematically analyze when retrieval can benefit code generation models and identify remaining challenges. We find that while retrieving high-quality contexts improves code generation, retrievers often struggle to fetch useful contexts, and generators face limitations in using those contexts effectively. We hope CodeRAG-bench encourages further development in code-oriented RAG methods.
Search
Fix data
Co-authors
- Akari Asai 1
- Daniel Fried 1
- Graham Neubig 1
- Zora Zhiruo Wang 1
- Yiqing Xie 1
- show all...