UniDataBench: Evaluating Data Analytics Agents Across Structured and Unstructured Data
Han Weng, Zhou Liu, Yuanfeng Song, Xiaoming Yin, Xing Chen, Wentao Zhang
Abstract
In real-world business environments, data is stored in a variety of sources, including structured relational databases, semi-structured databases, and unstructured files. The ability to extract reasonable insights across these diverse sources is integral to data-driven decision-making. Existing benchmarks, however, are limited in assessing agents’ capabilities across these diverse data types. To address this gap, we introduce UniDataBench, a multi-source benchmark designed to evaluate the performance of data analytics agents in handling diverse data sources. Specifically, UniDataBench is constructed based on real-life industry analysis reports, employing a pipeline to synthesize data that aligns with authentic analytical trends. It encompasses diverse datasets spanning relational databases, CSV files, and NoSQL stores to reflect real-world business settings, and provides a unified framework for evaluating how effectively agents can explore multiple data formats, extract insights, and generate meaningful summaries and recommendations. Based on UniDataBench, we propose a novel LLM-based agent named ReActInsight, an autonomous agent that performs end-to-end analysis over diverse data sources by automatically discovering cross-source linkages, decomposing goals, and generating robust, self-correcting code to extract actionable insights. Our benchmark and agent together provide a framework for facilitating the development of data analytics agents in real-world applications.- Anthology ID:
- 2026.acl-long.1556
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 33755–33780
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1556/
- DOI:
- Cite (ACL):
- Han Weng, Zhou Liu, Yuanfeng Song, Xiaoming Yin, Xing Chen, and Wentao Zhang. 2026. UniDataBench: Evaluating Data Analytics Agents Across Structured and Unstructured Data. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 33755–33780, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- UniDataBench: Evaluating Data Analytics Agents Across Structured and Unstructured Data (Weng et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1556.pdf