CodeGenWrangler: Data Wrangling task automation using Code-Generating Models

Ashlesha Akella, Abhijit Manatkar, Krishnasuri Narayanam, Sameep Mehta


Abstract
Assuring the data quality of tabular datasets is essential for the efficiency of the diverse tabular downstream tasks (like summarization and fact-checking). Data-wrangling tasks effectively address the challenges associated with structured data processing to improve the quality of tabular data. Traditional statistical methods handle numeric data efficiently but often fail to understand the semantic context of the textual data in tables. Deep learning approaches are resource-intensive, requiring task and dataset-specific training. Addressing these shortcomings, we present an automated system that leverages LLMs to generate executable code for data-wrangling tasks like missing value imputation, error detection, and error correction. Our system aims to identify inherent patterns in the data while leveraging external knowledge, effectively addressing both memory-independent and memory-dependent tasks.
Anthology ID:
2025.naacl-industry.70
Volume:
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Weizhu Chen, Yi Yang, Mohammad Kachuee, Xue-Yong Fu
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
949–960
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.naacl-industry.70/
DOI:
Bibkey:
Cite (ACL):
Ashlesha Akella, Abhijit Manatkar, Krishnasuri Narayanam, and Sameep Mehta. 2025. CodeGenWrangler: Data Wrangling task automation using Code-Generating Models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track), pages 949–960, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
CodeGenWrangler: Data Wrangling task automation using Code-Generating Models (Akella et al., NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.naacl-industry.70.pdf