Junkun Qiu
2026
Tree-Notebook: A Context-Aware Agent with Tree Search and Entropy-Aware Data Shadow for Interactive Data Science
Junkun Qiu | Min Huang | Qinghai Miao
Findings of the Association for Computational Linguistics: ACL 2026
Junkun Qiu | Min Huang | Qinghai Miao
Findings of the Association for Computational Linguistics: ACL 2026
While LLM-based agents have emerged as a focal point for automating data science tasks, they continue to grapple with inefficient context management, "silent failures" (where code executes correctly but fails the task objectives), and error propagation inherent in sequential generation. In this paper, we propose Tree-Notebook, an agentic framework designed to mimic the iterative cognitive process of human data scientists. At its core, Tree-Notebook conceptualizes Jupyter Notebook cells as nodes within a tree structure, facilitating organized and efficient context retrieval. We formalize the task-solving process as a Partially Observable Markov Decision Process (POMDP) over a dynamic tree, utilizing an entropy-based information gain function for path evaluation to enhance adaptability in real-world environments. Furthermore, we introduce the "Data Shadow" system, which resolves silent failures by performing real-time tracking of data distributions, provenance, and semantic constraints. Experimental results demonstrate that Tree-Notebook achieves state-of-the-art (SOTA) performance on both InfiAgent-DABench and DSBench. To further evaluate robustness, we introduce an augmented version of InfiAgent-DABench to simulate complex environments, where Tree-Notebook consistently maintains its SOTA standing. Code is available at: https://github.com/QJK-BUAA/Tree-Notebook