Yongxin Ni
2025
Data Interpreter: An LLM Agent for Data Science
Sirui Hong
|
Yizhang Lin
|
Bang Liu
|
Bangbang Liu
|
Binhao Wu
|
Ceyao Zhang
|
Danyang Li
|
Jiaqi Chen
|
Jiayi Zhang
|
Jinlin Wang
|
Li Zhang
|
Lingyao Zhang
|
Min Yang
|
Mingchen Zhuge
|
Taicheng Guo
|
Tuo Zhou
|
Wei Tao
|
Robert Tang
|
Xiangtao Lu
|
Xiawu Zheng
|
Xinbing Liang
|
Yaying Fei
|
Yuheng Cheng
|
Yongxin Ni
|
Zhibin Gou
|
Zongze Xu
|
Yuyu Luo
|
Chenglin Wu
Findings of the Association for Computational Linguistics: ACL 2025
Large Language Model (LLM)-based agents have excelled in various domains but face significant challenges when applied to data science workflows due to their complex, multi-stage nature. Current LLM-based agents struggle with non-linear relationships, recursive dependencies, implicit data- and logic-dependent reasoning, and managing extensive context. In this paper, we introduce Data Interpreter, an LLM-based agent that addresses these challenges through hierarchical graph-based modeling to represent the complexity and a progressive strategy for step-by-step verification, refinement, and consistent context management. Extensive experiments confirm the effectiveness of Data Interpreter. On InfiAgent-DABench, it boosts performance by 25% (from 75.9% to 94.9%), and on machine learning and open-ended tasks, it lifts accuracy from 88% to 95% and from 60% to 97%, respectively. Moreover, our method surpasses state-of-the-art baselines by 26% on the MATH dataset. We will release the code upon publication.
Search
Fix author
Co-authors
- Jiaqi Chen 1
- Yuheng Cheng 1
- Yaying Fei 1
- Zhibin Gou 1
- Taicheng Guo 1
- show all...