Yongxin Ni


Fixing paper assignments

  1. Please select all papers that do not belong to this person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
Data Interpreter: An LLM Agent for Data Science
Sirui Hong | Yizhang Lin | Bang Liu | Bangbang Liu | Binhao Wu | Ceyao Zhang | Danyang Li | Jiaqi Chen | Jiayi Zhang | Jinlin Wang | Li Zhang | Lingyao Zhang | Min Yang | Mingchen Zhuge | Taicheng Guo | Tuo Zhou | Wei Tao | Robert Tang | Xiangtao Lu | Xiawu Zheng | Xinbing Liang | Yaying Fei | Yuheng Cheng | Yongxin Ni | Zhibin Gou | Zongze Xu | Yuyu Luo | Chenglin Wu
Findings of the Association for Computational Linguistics: ACL 2025

Large Language Model (LLM)-based agents have excelled in various domains but face significant challenges when applied to data science workflows due to their complex, multi-stage nature. Current LLM-based agents struggle with non-linear relationships, recursive dependencies, implicit data- and logic-dependent reasoning, and managing extensive context. In this paper, we introduce Data Interpreter, an LLM-based agent that addresses these challenges through hierarchical graph-based modeling to represent the complexity and a progressive strategy for step-by-step verification, refinement, and consistent context management. Extensive experiments confirm the effectiveness of Data Interpreter. On InfiAgent-DABench, it boosts performance by 25% (from 75.9% to 94.9%), and on machine learning and open-ended tasks, it lifts accuracy from 88% to 95% and from 60% to 97%, respectively. Moreover, our method surpasses state-of-the-art baselines by 26% on the MATH dataset. We will release the code upon publication.

pdf bib
SOLAR: Serendipity Optimized Language Model Aligned for Recommendation
Zichen Yuan | Lifan Sun | Yucen Zhuang | Yue Wang | Xinyuan Song | Tianqi Xu | Siyuan Li | Junchen Fu | Youhua Li | Sirui Hong | Jiaqi Chen | Joemon M. Jose | Yongxin Ni
Findings of the Association for Computational Linguistics: EMNLP 2025

Recently, Large Language Models (LLMs) have shown strong potential in recommendation tasks due to their broad world knowledge and reasoning capabilities. However, applying them to serendipity-oriented recommendation remains challenging, mainly due to a domain gap of LLMs in modeling personalized user behavior and the scarcity of labeled serendipitous interactions. In this paper, we introduce **SOLAR** (**S**erendipity-**O**ptimized **L**anguage model **A**ligned for **R**ecommendation), a two-stage framework that addresses these challenges. To alleviate label scarcity, we adopt a weak supervision strategy: a sequential ID-based recommender generates candidate items, which are then reranked by an LLM acting as a preference judge to produce serendipity-aware pseudo-labels. To bridge the domain gap, we propose a domain-adaptive instruction tuning method (SUN) that aligns LLMs with recommendation tasks. Experiments on three real-world datasets show that **SOLAR** consistently improves both accuracy and serendipity over strong baselines, showing its effectiveness in enabling more diverse, user-centric recommendations. Code and dataset are released at [https://github.com/SOLAR2025ARR/SOLAR](https://github.com/SOLAR2025ARR/SOLAR).