Xinbing Liang
2025
Data Interpreter: An LLM Agent for Data Science
Sirui Hong
|
Yizhang Lin
|
Bang Liu
|
Bangbang Liu
|
Binhao Wu
|
Ceyao Zhang
|
Danyang Li
|
Jiaqi Chen
|
Jiayi Zhang
|
Jinlin Wang
|
Li Zhang
|
Lingyao Zhang
|
Min Yang
|
Mingchen Zhuge
|
Taicheng Guo
|
Tuo Zhou
|
Wei Tao
|
Robert Tang
|
Xiangtao Lu
|
Xiawu Zheng
|
Xinbing Liang
|
Yaying Fei
|
Yuheng Cheng
|
Yongxin Ni
|
Zhibin Gou
|
Zongze Xu
|
Yuyu Luo
|
Chenglin Wu
Findings of the Association for Computational Linguistics: ACL 2025
Large Language Model (LLM)-based agents have excelled in various domains but face significant challenges when applied to data science workflows due to their complex, multi-stage nature. Current LLM-based agents struggle with non-linear relationships, recursive dependencies, implicit data- and logic-dependent reasoning, and managing extensive context. In this paper, we introduce Data Interpreter, an LLM-based agent that addresses these challenges through hierarchical graph-based modeling to represent the complexity and a progressive strategy for step-by-step verification, refinement, and consistent context management. Extensive experiments confirm the effectiveness of Data Interpreter. On InfiAgent-DABench, it boosts performance by 25% (from 75.9% to 94.9%), and on machine learning and open-ended tasks, it lifts accuracy from 88% to 95% and from 60% to 97%, respectively. Moreover, our method surpasses state-of-the-art baselines by 26% on the MATH dataset. We will release the code upon publication.
Self-Supervised Prompt Optimization
Jinyu Xiang
|
Jiayi Zhang
|
Zhaoyang Yu
|
Xinbing Liang
|
Fengwei Teng
|
Jinhao Tu
|
Fashen Ren
|
Xiangru Tang
|
Sirui Hong
|
Chenglin Wu
|
Yuyu Luo
Findings of the Association for Computational Linguistics: EMNLP 2025
Well-designed prompts are crucial for enhancing Large language models’ (LLMs) reasoning capabilities while aligning their outputs with task requirements across diverse domains. However, manually designed prompts require expertise and iterative experimentation. While existing prompt optimization methods aim to automate this process, they rely heavily on external references such as ground truth or by humans, limiting their applicability in real-world scenarios where such data is unavailable or costly to obtain. To address this, we propose Self-Supervised Prompt Optimization (SPO), a cost-efficient framework that discovers effective prompts for both closed and open-ended tasks without requiring external reference. Motivated by the observations that prompt quality manifests directly in LLM outputs and LLMs can effectively assess adherence to task requirements, we derive evaluation and optimization signals purely from output comparisons. Specifically, SPO selects superior prompts through pairwise output comparisons evaluated by an LLM evaluator, followed by an LLM optimizer that aligns outputs with task requirements. Extensive experiments demonstrate that SPO outperforms state-of-the-art prompt optimization methods, achieving comparable or superior results with significantly lower costs (e.g., 1.1% to 5.6% of existing methods) and fewer samples (e.g., three samples).
Search
Fix author
Co-authors
- Sirui Hong 2
- Yuyu Luo 2
- Chenglin Wu 2
- Jiayi Zhang 2
- Jiaqi Chen 1
- show all...
- Yuheng Cheng 1
- Yaying Fei 1
- Zhibin Gou 1
- Taicheng Guo 1
- Danyang Li 1
- Yizhang Lin 1
- Bang Liu 1
- Bangbang Liu 1
- Xiangtao Lu 1
- Yongxin Ni 1
- Fashen Ren 1
- Robert Tang 1
- Xiangru Tang 1
- Wei Tao 1
- Fengwei Teng 1
- Jinhao Tu 1
- Jinlin Wang 1
- Binhao Wu (吴斌浩) 1
- Jinyu Xiang 1
- Zongze Xu 1
- Min Yang 1
- Zhaoyang Yu 1
- Ceyao Zhang 1
- Li Zhang 1
- Lingyao Zhang 1
- Xiawu Zheng 1
- Tuo Zhou 1
- Mingchen Zhuge 1