FinStat2SQL: A Text2SQL Pipeline for Financial Statement Analysis

Hung Quang Nguyen, Anh Phuong Trinh, Hung Phan Quoc Mai, Phong Tuan Trinh


Abstract
Despite recent advances in LLMs, Text2SQL remains challenging for complex, domain-specific queries such as finance, where database designs and reporting standards vary widely. We introduce FinStat2SQL, a lightweight pipeline enabling natural language queries over financial statements, tailored to local standards like VAS. Our multi-agent setup combines large and small language models for entity extraction, SQL generation, and self-correction, and includes a fully automatic pipeline for synthetic data generation. Leveraging this synthetic data, we fine-tuned a 7B model that achieves 61.33% accuracy with sub-4s latency on consumer hardware, outperforming GPT-4o-mini on SQL generation. FinStat2SQL provides a scalable, cost-efficient solution for financial analysis. We made our source code publicly available at: https://github.com/hung20gg/chatbot_financial_statement.
Anthology ID:
2025.inlg-main.27
Volume:
Proceedings of the 18th International Natural Language Generation Conference
Month:
October
Year:
2025
Address:
Hanoi, Vietnam
Editors:
Lucie Flek, Shashi Narayan, Lê Hồng Phương, Jiahuan Pei
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
449–464
Language:
URL:
https://preview.aclanthology.org/ingest-luhme/2025.inlg-main.27/
DOI:
Bibkey:
Cite (ACL):
Hung Quang Nguyen, Anh Phuong Trinh, Hung Phan Quoc Mai, and Phong Tuan Trinh. 2025. FinStat2SQL: A Text2SQL Pipeline for Financial Statement Analysis. In Proceedings of the 18th International Natural Language Generation Conference, pages 449–464, Hanoi, Vietnam. Association for Computational Linguistics.
Cite (Informal):
FinStat2SQL: A Text2SQL Pipeline for Financial Statement Analysis (Nguyen et al., INLG 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-luhme/2025.inlg-main.27.pdf