The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation

Dung Nguyen Manh, Nam Le Hai, Anh T. V. Dau, Anh Minh Nguyen, Khanh Nghiem, Jin Guo, Nghi D. Q. Bui


Anthology ID:
2023.nlposs-1.25
Volume:
Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)
Month:
December
Year:
2023
Address:
Singapore
Editors:
Liling Tan, Dmitrijs Milajevs, Geeticka Chauhan, Jeremy Gwinnup, Elijah Rippeth
Venues:
NLPOSS | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
219–244
Language:
URL:
https://aclanthology.org/2023.nlposs-1.25
DOI:
10.18653/v1/2023.nlposs-1.25
Bibkey:
Cite (ACL):
Dung Nguyen Manh, Nam Le Hai, Anh T. V. Dau, Anh Minh Nguyen, Khanh Nghiem, Jin Guo, and Nghi D. Q. Bui. 2023. The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation. In Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023), pages 219–244, Singapore. Association for Computational Linguistics.
Cite (Informal):
The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation (Manh et al., NLPOSS-WS 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2023.nlposs-1.25.pdf
Video:
 https://preview.aclanthology.org/emnlp-22-attachments/2023.nlposs-1.25.mp4