Xing Chen
2026
UniDataBench: Evaluating Data Analytics Agents Across Structured and Unstructured Data
Han Weng | Zhou Liu | Yuanfeng Song | Xiaoming Yin | Xing Chen | Wentao Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Han Weng | Zhou Liu | Yuanfeng Song | Xiaoming Yin | Xing Chen | Wentao Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In real-world business environments, data is stored in a variety of sources, including structured relational databases, semi-structured databases, and unstructured files. The ability to extract reasonable insights across these diverse sources is integral to data-driven decision-making. Existing benchmarks, however, are limited in assessing agents’ capabilities across these diverse data types. To address this gap, we introduce UniDataBench, a multi-source benchmark designed to evaluate the performance of data analytics agents in handling diverse data sources. Specifically, UniDataBench is constructed based on real-life industry analysis reports, employing a pipeline to synthesize data that aligns with authentic analytical trends. It encompasses diverse datasets spanning relational databases, CSV files, and NoSQL stores to reflect real-world business settings, and provides a unified framework for evaluating how effectively agents can explore multiple data formats, extract insights, and generate meaningful summaries and recommendations. Based on UniDataBench, we propose a novel LLM-based agent named ReActInsight, an autonomous agent that performs end-to-end analysis over diverse data sources by automatically discovering cross-source linkages, decomposing goals, and generating robust, self-correcting code to extract actionable insights. Our benchmark and agent together provide a framework for facilitating the development of data analytics agents in real-world applications.
VizoMem: A Visual-Textual Memory Framework for Efficient Long-Horizon Reasoning
Weijie Liang | Yuanfeng Song | Xing Chen | Caleb Chen Cao | Sirui Han | Yike Guo
Findings of the Association for Computational Linguistics: ACL 2026
Weijie Liang | Yuanfeng Song | Xing Chen | Caleb Chen Cao | Sirui Han | Yike Guo
Findings of the Association for Computational Linguistics: ACL 2026
Agentic systems built upon large language models (LLMs) increasingly depend on long-context modeling to support document understanding, long-term memory recall, and multi-step reasoning. However, extending context windows incurs substantial computational and memory overhead, significantly limiting the scalability and practicality of long-context LLM-based agents. Recent studies suggest that visual representations can serve as an effective medium for compressing and organizing long textual content. Motivated by this insight, we propose VizoMem, a novel visual memory framework for agentic systems. In this framework, textual memories are pre-rendered into structured images and stored as visual notes, enabling compact and persistent memory representations. Moving beyond standard vision-language models like Glyph, we pioneer a specialized retrieval system designed for large-scale visual memory. Our innovation lies in the construction of a dedicated dataset and the development of a highly efficient retrieval model that repurposes foundational vision-language encoders to navigate complex, text-heavy visual environments. Experiments on public datasets demonstrate that our approach significantly reduces token consumption while preserving effective long-term memory recall, highlighting its potential as a scalable alternative to conventional long-context modeling.
DataSage: Multi-agent Collaboration for Insight Discovery with External Knowledge Retrieval, Multi-role Debating, and Multi-path Reasoning
Xiaochuan Liu | Yuanfeng Song | Xiaoming Yin | Xing Chen
Findings of the Association for Computational Linguistics: ACL 2026
Xiaochuan Liu | Yuanfeng Song | Xiaoming Yin | Xing Chen
Findings of the Association for Computational Linguistics: ACL 2026
In today’s data-driven era, fully automated end-to-end data analytics, particularly insight discovery, is critical for discovering actionable insights that assist organizations in making effective decisions. With the rapid advancement of large language models (LLMs), LLM-driven agents have emerged as a promising paradigm for automating insight discovery. However, existing data insight agents remain limited in several key aspects, often failing to deliver satisfactory results due to: (1) insufficient utilization of domain knowledge, (2) shallow analytical depth, and (3) error-prone code generation. To address these issues, we propose DataSage, a novel multi-agent framework that incorporates three innovative features including external knowledge retrieval to enrich the analytical context, a multi-role debating mechanism to simulate diverse analytical perspectives and deepen analytical depth, and multi-path reasoning to improve the accuracy of the generated code and insights. Extensive experiments on InsightBench demonstrate that DataSage consistently outperforms existing data insight agents across all difficulty levels, improving by 7.5% and 13.9% respectively in the insight-level and summary-level metrics. It offers an effective solution for automated data insight discovery.
DataSeer: A Manager-Centric Collaborative Multi-Agent Framework with Multi-Branch Reasoning for Automated Insight Discovery
Suchen Liu | Yuanfeng Song | Jun Gao | Xing Chen
Findings of the Association for Computational Linguistics: ACL 2026
Suchen Liu | Yuanfeng Song | Jun Gao | Xing Chen
Findings of the Association for Computational Linguistics: ACL 2026
The growth of complex data fuels demand for automated insight discovery. While LLMs and agent technologies have advanced data analysis, existing methods struggle with maintaining contextual coherence, limited coverage due to single-path exploration, and rigid planning that fails to adapt to dynamic data discovery. We propose DataSeer, a collaborative multi-agent framework for automated insight discovery. Our first contribution is a Manager-Centric Collaborative Framework, where the Manager ensures cross-episode contextual coherence through a dual-layer memory system with compression, consolidation, and retrieval, alongside dynamic prompt editing, coordinating the overall process between the Planner and Executor. Second, we optimize the planning and execution components: the Planner employs multi-role discussion for adaptive sub-goal generation and plan refinement; the Executor is endowed with tactical autonomy for exploratory execution and incorporates real-time multi-dimensional self-assessment to guarantee insight quality. Third, we design Multi-Branch Reasoning that executes multiple discovery trajectories and synthesizes outcomes through LLM-based aggregation, improving coverage and reducing single-path stochasticity. Experiments on InsightBench and InsightEval show that DataSeer outperforms baselines, achieving improvements of 18.7% and 12.1% in insight-level scores, and 11.6% and 10.3% in summary-level scores, respectively.
InsightEval: An Expert-Curated Benchmark for Assessing Insight Discovery in LLM-Driven Data Agents
Zhenghao Zhu | Yuanfeng Song | Xing Chen | Chengzhong Liu | Cui Yakun | Caleb Chen Cao | Sirui Han | Yike Guo
Findings of the Association for Computational Linguistics: ACL 2026
Zhenghao Zhu | Yuanfeng Song | Xing Chen | Chengzhong Liu | Cui Yakun | Caleb Chen Cao | Sirui Han | Yike Guo
Findings of the Association for Computational Linguistics: ACL 2026
Data analysis has become an indispensable part of scientific research. To discover the latent knowledge and insights hidden within massive datasets, we need to perform deep exploratory analysis to realize their full value. With the advent of large language models (LLMs) and multi-agent systems, more and more researchers are making use of these technologies for insight discovery. However, there are few benchmarks for evaluating insight discovery capabilities. As one of the most comprehensive existing frameworks, InsightBench also suffers from many critical flaws: format inconsistencies, poorly conceived objectives, and redundant insights. These issues may significantly affect the quality of data and the evaluation of agents. To address these issues, we thoroughly investigate shortcomings in InsightBench and propose essential criteria for a high-quality insight benchmark. Regarding this, we develop a data-curation pipeline to construct a new dataset named InsightEval. We further introduce a novel metric to measure the exploratory performance of agents. Through extensive experiments on InsightEval, we highlight prevailing challenges in automated insight discovery and raise some key findings to guide future research in this promising direction.
2025
Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward
Han Weng | Puzhen Wu | Cui Longjie | Yi Zhan | Boyi Liu | Yuanfeng Song | Dun Zeng | Yingxiang Yang | Qianru Zhang | Dong Huang | Xiaoming Yin | Yang Sun | Xing Chen
Findings of the Association for Computational Linguistics: EMNLP 2025
Han Weng | Puzhen Wu | Cui Longjie | Yi Zhan | Boyi Liu | Yuanfeng Song | Dun Zeng | Yingxiang Yang | Qianru Zhang | Dong Huang | Xiaoming Yin | Yang Sun | Xing Chen
Findings of the Association for Computational Linguistics: EMNLP 2025
Reinforcement learning (RL) has been widely adopted to enhance the performance of large language models (LLMs) on Text-to-SQL tasks. However, existing methods often rely on execution-based or LLM-based Bradley–Terry reward models. The former suffers from high execution latency caused by repeated database calls, whereas the latter imposes substantial GPU memory overhead, both of which significantly hinder the efficiency and scalability of RL pipelines. To this end, we propose a novel reward model framework for RL-based Text-to-SQL named Graph-Reward-SQL, which employs the GMNScore outcome reward model. We leverage SQL graph representations to provide accurate reward signals while significantly reducing time cost and GPU memory usage. Building on this foundation, we further introduce StepRTM, a stepwise reward model that provides intermediate supervision over Common Table Expression (CTE) subqueries. This encourages both functional correctness and readability of SQL. Extensive comparative and ablation experiments on standard benchmarks, including Spider and BIRD, demonstrate that our method consistently outperforms existing reward models.
2022
A Corpus-based Study of Corporate Image Represented in Corporate Social Responsibility Report: A Case Study of China Mobile and Vodafone
Xing Chen | Liang Xu
Proceedings of the First Computing Social Responsibility Workshop within the 13th Language Resources and Evaluation Conference
Xing Chen | Liang Xu
Proceedings of the First Computing Social Responsibility Workshop within the 13th Language Resources and Evaluation Conference
By examination of the high-frequency nouns, verbs, and keywords, the present study probes into the similarities and differences of corporate images represented in Corporate Social Responsibility (CSR) reports of China Mobile and Vodafone. The results suggest that: 1) both China Mobile and Vodafone prefer using some positive words, like improve, support and service to shape a positive, approachable and easy-going corporate image, and an image of prioritizing the environmental sustainability and the well-being of people; 2) CSR reports of China Mobile contain the keywords poverty and alleviation, which means China Mobile is pragmatic, collaborative and active to assume the responsibility for social events; 3) CSR reports of Vodafone contain keywords like privacy, women and global as well as some other countries, which shows Vodafone is enterprising, globalized and attentive to the development of women; 4) these differences might be related to the ideology and social culture of Chinese and British companies. This study may contribute to understanding the function of CSR report and offer helpful implications for broadening the research of corporate image.