Metric-Oriented Pretraining of Neural Source Code Summarisation Transformers to Enable more Secure Software Development

Jesse Phillips, Mo El-Haj, Tracy Hall


Abstract
Source code summaries give developers and maintainers vital information about source code methods. These summaries aid with the security of software systems as they can be used to improve developer and maintainer understanding of code, with the aim of reducing the number of bugs and vulnerabilities. However writing these summaries takes up the developers’ time and these summaries are often missing, incomplete, or outdated. Neural source code summarisation solves these issues by summarising source code automatically. Current solutions use Transformer neural networks to achieve this. We present CodeSumBART - a BART-base model for neural source code summarisation, pretrained on a dataset of Java source code methods and English method summaries. We present a new approach to training Transformers for neural source code summarisation by using epoch validation results to optimise the performance of the model. We found that in our approach, using larger n-gram precision BLEU metrics for epoch validation, such as BLEU-4, produces better performing models than other common NLG metrics.
Anthology ID:
2024.nlpaics-1.3
Volume:
Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security
Month:
July
Year:
2024
Address:
Lancaster, UK
Editors:
Ruslan Mitkov, Saad Ezzini, Tharindu Ranasinghe, Ignatius Ezeani, Nouran Khallaf, Cengiz Acarturk, Matthew Bradbury, Mo El-Haj, Paul Rayson
Venue:
NLPAICS
SIG:
Publisher:
International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security
Note:
Pages:
17–31
Language:
URL:
https://preview.aclanthology.org/landing_page/2024.nlpaics-1.3/
DOI:
Bibkey:
Cite (ACL):
Jesse Phillips, Mo El-Haj, and Tracy Hall. 2024. Metric-Oriented Pretraining of Neural Source Code Summarisation Transformers to Enable more Secure Software Development. In Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security, pages 17–31, Lancaster, UK. International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security.
Cite (Informal):
Metric-Oriented Pretraining of Neural Source Code Summarisation Transformers to Enable more Secure Software Development (Phillips et al., NLPAICS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2024.nlpaics-1.3.pdf