UniDataBench: Evaluating Data Analytics Agents Across Structured and Unstructured Data

Han Weng; Zhou Liu; Yuanfeng Song; Xiaoming Yin; Xing Chen; Wentao Zhang

UniDataBench: Evaluating Data Analytics Agents Across Structured and Unstructured Data

Han Weng, Zhou Liu, Yuanfeng Song, Xiaoming Yin, Xing Chen, Wentao Zhang

Abstract

In real-world business environments, data is stored in a variety of sources, including structured relational databases, semi-structured databases, and unstructured files. The ability to extract reasonable insights across these diverse sources is integral to data-driven decision-making. Existing benchmarks, however, are limited in assessing agents’ capabilities across these diverse data types. To address this gap, we introduce UniDataBench, a multi-source benchmark designed to evaluate the performance of data analytics agents in handling diverse data sources. Specifically, UniDataBench is constructed based on real-life industry analysis reports, employing a pipeline to synthesize data that aligns with authentic analytical trends. It encompasses diverse datasets spanning relational databases, CSV files, and NoSQL stores to reflect real-world business settings, and provides a unified framework for evaluating how effectively agents can explore multiple data formats, extract insights, and generate meaningful summaries and recommendations. Based on UniDataBench, we propose a novel LLM-based agent named ReActInsight, an autonomous agent that performs end-to-end analysis over diverse data sources by automatically discovering cross-source linkages, decomposing goals, and generating robust, self-correcting code to extract actionable insights. Our benchmark and agent together provide a framework for facilitating the development of data analytics agents in real-world applications.

Anthology ID:: 2026.acl-long.1556
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 33755–33780
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1556/
DOI:
Bibkey:
Cite (ACL):: Han Weng, Zhou Liu, Yuanfeng Song, Xiaoming Yin, Xing Chen, and Wentao Zhang. 2026. UniDataBench: Evaluating Data Analytics Agents Across Structured and Unstructured Data. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 33755–33780, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: UniDataBench: Evaluating Data Analytics Agents Across Structured and Unstructured Data (Weng et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1556.pdf
Checklist:: 2026.acl-long.1556.checklist.pdf

PDF Cite Search Checklist Fix data