One-to-many testing for code generation from (just) natural language
Mansi Uniyal, Mukul Singh, Gust Verbruggen, Sumit Gulwani, Vu Le
Abstract
MBPP is a popular dataset for evaluating the task of code generation from natural language. Despite its popularity, there are three problems: (1) it relies on providing test cases to generate the right signature, (2) there is poor alignment between instruction and evaluation test cases, and (3) contamination of the exact phrasing being present in training datasets. We adapt MBPP to emphasize on generating code from just natural language by (1) removing ambiguity about the semantics of the task from the descriptions, and (2) evaluating generated code on multiple sets of assertions to account for ambiguity in the syntax. We compare popular open and closed weight models on the original (MBPP) and adapted (MBUPP) datasets.- Anthology ID:
- 2024.findings-emnlp.902
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2024
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 15397–15402
- Language:
- URL:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-emnlp.902/
- DOI:
- 10.18653/v1/2024.findings-emnlp.902
- Cite (ACL):
- Mansi Uniyal, Mukul Singh, Gust Verbruggen, Sumit Gulwani, and Vu Le. 2024. One-to-many testing for code generation from (just) natural language. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 15397–15402, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- One-to-many testing for code generation from (just) natural language (Uniyal et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-emnlp.902.pdf