Analyzing the Performance of Large Language Models on Code Summarization

Rajarshi Haldar; Julia Hockenmaier

Analyzing the Performance of Large Language Models on Code Summarization

Abstract

Large language models (LLMs) such as Llama 2 perform very well on tasks that involve both natural language and source code, particularly code summarization and code generation. We show that for the task of code summarization, the performance of these models on individual examples often depends on the amount of (subword) token overlap between the code and the corresponding reference natural language descriptions in the dataset. This token overlap arises because the reference descriptions in standard datasets (corresponding to docstrings in large code bases) are often highly similar to the names of the functions they describe. We also show that this token overlap occurs largely in the function names of the code and compare the relative performance of these models after removing function names versus removing code structure. We also show that using multiple evaluation metrics like BLEU and BERTScore gives us very little additional insight since these metrics are highly correlated with each other.

Anthology ID:: 2024.lrec-main.89
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 995–1008
Language:
URL:: https://preview.aclanthology.org/landing_page/2024.lrec-main.89/
DOI:
Bibkey:
Cite (ACL):: Rajarshi Haldar and Julia Hockenmaier. 2024. Analyzing the Performance of Large Language Models on Code Summarization. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 995–1008, Torino, Italia. ELRA and ICCL.
Cite (Informal):: Analyzing the Performance of Large Language Models on Code Summarization (Haldar & Hockenmaier, LREC-COLING 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2024.lrec-main.89.pdf

PDF Cite Search Fix data