Abstract
The ability to generate code using large language models (LLMs) has been increasing year by year. However, studies on code generation at the repository level are not very active. In repository-level code generation, it is necessary to refer to related code snippets among multiple files. By taking the similarity between code snippets, related files are searched and input into an LLM, and generation is performed. This paper proposes a method to search for related files (code search) by taking similarities not between code snippets but between the texts converted from the code snippets by the LLM. We confirmed that converting to text improves the accuracy of code search.- Anthology ID:
- 2024.naacl-srw.15
- Volume:
- Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Yang (Trista) Cao, Isabel Papadimitriou, Anaelia Ovalle
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 130–137
- Language:
- URL:
- https://aclanthology.org/2024.naacl-srw.15
- DOI:
- Cite (ACL):
- Mizuki Kondo, Daisuke Kawahara, and Toshiyuki Kurabayashi. 2024. Improving Repository-level Code Search with Text Conversion. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), pages 130–137, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Improving Repository-level Code Search with Text Conversion (Kondo et al., NAACL 2024)
- PDF:
- https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.naacl-srw.15.pdf