Yongxin Ni

2025

Large Language Model (LLM)-based agents have excelled in various domains but face significant challenges when applied to data science workflows due to their complex, multi-stage nature. Current LLM-based agents struggle with non-linear relationships, recursive dependencies, implicit data- and logic-dependent reasoning, and managing extensive context. In this paper, we introduce Data Interpreter, an LLM-based agent that addresses these challenges through hierarchical graph-based modeling to represent the complexity and a progressive strategy for step-by-step verification, refinement, and consistent context management. Extensive experiments confirm the effectiveness of Data Interpreter. On InfiAgent-DABench, it boosts performance by 25% (from 75.9% to 94.9%), and on machine learning and open-ended tasks, it lifts accuracy from 88% to 95% and from 60% to 97%, respectively. Moreover, our method surpasses state-of-the-art baselines by 26% on the MATH dataset. We will release the code upon publication.

Recently, Large Language Models (LLMs) have shown strong potential in recommendation tasks due to their broad world knowledge and reasoning capabilities. However, applying them to serendipity-oriented recommendation remains challenging, mainly due to a domain gap of LLMs in modeling personalized user behavior and the scarcity of labeled serendipitous interactions. In this paper, we introduce **SOLAR** (**S**erendipity-**O**ptimized **L**anguage model **A**ligned for **R**ecommendation), a two-stage framework that addresses these challenges. To alleviate label scarcity, we adopt a weak supervision strategy: a sequential ID-based recommender generates candidate items, which are then reranked by an LLM acting as a preference judge to produce serendipity-aware pseudo-labels. To bridge the domain gap, we propose a domain-adaptive instruction tuning method (SUN) that aligns LLMs with recommendation tasks. Experiments on three real-world datasets show that **SOLAR** consistently improves both accuracy and serendipity over strong baselines, showing its effectiveness in enabling more diverse, user-centric recommendations. Code and dataset are released at [https://github.com/SOLAR2025ARR/SOLAR](https://github.com/SOLAR2025ARR/SOLAR).