FuzzAug: Data Augmentation by Coverage-guided Fuzzing for Neural Test Generation

Yifeng He, Jicheng Wang, Yuyang Rong, Hao Chen


Abstract
Testing is essential to modern software engineering for building reliable software.Given the high costs of manually creating test cases,automated test case generation, particularly methods utilizing large language models,has become increasingly popular.These neural approaches generate semantically meaningful tests that are more maintainable compared with traditional automated testing methods such as fuzzing.However, the diversity and volume of unit tests in current datasets are limited, especially for newer but important languages.In this paper, we present a novel data augmentation technique, *FuzzAug*,that brings the benefits of fuzzing to large language models by incorporating valid testing semantics and providing diverse coverage-guided inputs.Doubling the size of training datasets,FuzzAug improves performance over the baselines significantly.This technique demonstrates the potential of introducing prior knowledge from dynamic software analysisto improve neural test generation,offering significant enhancements in this task.Our code is open-sourced at https://github.com/SecurityLab-UCD/FuzzAug.
Anthology ID:
2025.findings-emnlp.847
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15642–15655
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.847/
DOI:
10.18653/v1/2025.findings-emnlp.847
Bibkey:
Cite (ACL):
Yifeng He, Jicheng Wang, Yuyang Rong, and Hao Chen. 2025. FuzzAug: Data Augmentation by Coverage-guided Fuzzing for Neural Test Generation. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 15642–15655, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
FuzzAug: Data Augmentation by Coverage-guided Fuzzing for Neural Test Generation (He et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.847.pdf
Checklist:
 2025.findings-emnlp.847.checklist.pdf