An LLM-Based Approach for Insight Generation in Data Analysis
Alberto Sánchez Pérez, Alaa Boukhary, Paolo Papotti, Luis Castejón Lozano, Adam Elwood
Abstract
Generating insightful and actionable information from databases is critical in data analysis. This paper introduces a novel approach using Large Language Models (LLMs) to automatically generate textual insights. Given a multi-table database as input, our method leverages LLMs to produce concise, text-based insights that reflect interesting patterns in the tables. Our framework includes a Hypothesis Generator to formulate domain-relevant questions, a Query Agent to answer such questions by generating SQL queries against a database, and a Summarization module to verbalize the insights. The insights are evaluated for both correctness and subjective insightfulness using a hybrid model of human judgment and automated metrics. Experimental results on public and enterprise databases demonstrate that our approach generates more insightful insights than other approaches while maintaining correctness.- Anthology ID:
- 2025.naacl-long.24
- Volume:
- Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
- Month:
- April
- Year:
- 2025
- Address:
- Albuquerque, New Mexico
- Editors:
- Luis Chiruzzo, Alan Ritter, Lu Wang
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 562–582
- Language:
- URL:
- https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.naacl-long.24/
- DOI:
- Cite (ACL):
- Alberto Sánchez Pérez, Alaa Boukhary, Paolo Papotti, Luis Castejón Lozano, and Adam Elwood. 2025. An LLM-Based Approach for Insight Generation in Data Analysis. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 562–582, Albuquerque, New Mexico. Association for Computational Linguistics.
- Cite (Informal):
- An LLM-Based Approach for Insight Generation in Data Analysis (Pérez et al., NAACL 2025)
- PDF:
- https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.naacl-long.24.pdf