ProConSuL: Project Context for Code Summarization with LLMs
Vadim Lomshakov, Andrey Podivilov, Sergey Savin, Oleg Baryshnikov, Alena Lisevych, Sergey Nikolenko
Abstract
We propose Project Context for Code Summarization with LLMs (ProConSuL), a new framework to provide a large language model (LLM) with precise information about the code structure from program analysis methods such as a compiler or IDE language services and use task decomposition derived from the code structure. ProConSuL builds a call graph to provide the context from callees and uses a two-phase training method (SFT + preference alignment) to train the model to use the project context. We also provide a new evaluation benchmark for C/C++ functions and a set of proxy metrics. Experimental results demonstrate that ProConSuL allows to significantly improve code summaries and reduce the number of hallucinations compared to the base model (CodeLlama-7B-instruct). We make our code and dataset available at https://github.com/TypingCat13/ProConSuL.- Anthology ID:
- 2024.emnlp-industry.65
- Volume:
- Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, US
- Editors:
- Franck Dernoncourt, Daniel Preoţiuc-Pietro, Anastasia Shimorina
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 866–880
- Language:
- URL:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.emnlp-industry.65/
- DOI:
- 10.18653/v1/2024.emnlp-industry.65
- Cite (ACL):
- Vadim Lomshakov, Andrey Podivilov, Sergey Savin, Oleg Baryshnikov, Alena Lisevych, and Sergey Nikolenko. 2024. ProConSuL: Project Context for Code Summarization with LLMs. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 866–880, Miami, Florida, US. Association for Computational Linguistics.
- Cite (Informal):
- ProConSuL: Project Context for Code Summarization with LLMs (Lomshakov et al., EMNLP 2024)
- PDF:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.emnlp-industry.65.pdf