Yongxin Ni


2025

pdf bib
Data Interpreter: An LLM Agent for Data Science
Sirui Hong | Yizhang Lin | Bang Liu | Bangbang Liu | Binhao Wu | Ceyao Zhang | Danyang Li | Jiaqi Chen | Jiayi Zhang | Jinlin Wang | Li Zhang | Lingyao Zhang | Min Yang | Mingchen Zhuge | Taicheng Guo | Tuo Zhou | Wei Tao | Robert Tang | Xiangtao Lu | Xiawu Zheng | Xinbing Liang | Yaying Fei | Yuheng Cheng | Yongxin Ni | Zhibin Gou | Zongze Xu | Yuyu Luo | Chenglin Wu
Findings of the Association for Computational Linguistics: ACL 2025

Large Language Model (LLM)-based agents have excelled in various domains but face significant challenges when applied to data science workflows due to their complex, multi-stage nature. Current LLM-based agents struggle with non-linear relationships, recursive dependencies, implicit data- and logic-dependent reasoning, and managing extensive context. In this paper, we introduce Data Interpreter, an LLM-based agent that addresses these challenges through hierarchical graph-based modeling to represent the complexity and a progressive strategy for step-by-step verification, refinement, and consistent context management. Extensive experiments confirm the effectiveness of Data Interpreter. On InfiAgent-DABench, it boosts performance by 25% (from 75.9% to 94.9%), and on machine learning and open-ended tasks, it lifts accuracy from 88% to 95% and from 60% to 97%, respectively. Moreover, our method surpasses state-of-the-art baselines by 26% on the MATH dataset. We will release the code upon publication.